NCI cracks open cancer research by moving it to the cloud

 

Connecting state and local government leaders

The agency is looking for cloud-based architectures to give cancer researchers complete bioinformatics infrastructure as a service, with built-in compute, storage, security and analytics.

The National Cancer Institute (NCI) is at the bleeding edge of a trend toward using a cloud-based infrastructure to process, store and analyze its massive data sets. 

6 Key Issues for NCI Cancer Clouds

The biomedical research community has identified six issues that need to be addressed in the NCI Cloud Initiative:

  1. Data access
  2. Computing capacity
  3. Data interoperability
  4. Training
  5. Usability
  6. Governance

In 2014, NCI plans to award three contracts for pilot projects to create cloud computing environments that will house a new data set — called the Cancer Genome Atlas — that is expected to be 2.5 petabytes in size. The so-called cancer clouds will not only serve as a central repository for the genomics data but also will provide a standard application programming interface (API) and tools for researchers to use in analyzing the data. 

NCI’s goal for the cloud initiative is to democratize access to cancer genomics data to advance the treatment of cancer. The agency’s biomedical data sets are becoming so large that it is too expensive for most universities and pharmaceutical companies to invest in powerful-enough computers and large enough network bandwidth to download the data sets and process them locally, as is common practice today. 

Instead, NCI is looking for cloud-based architectures that would allow researchers to access the data sets via a Web browser and analyze them remotely. Then researchers only would need the local bandwidth and processing power required to download results, which will be much smaller data sets.

"We want the smart graduate student or the smart postdoc who has an idea for a novel way of analyzing data to be able to relatively quickly write a piece of software to do analysis on the data and run it inexpensively,’’ said George Komatsoulis, Interim Director of NCI’s Center for Biomedical Informatics and Information Technology. "Now this person would need $2.5 million of hardware and wouldn’t generate an answer for a year.’’ 

Fundamental shift in research

Komatsoulis says the bottom-line return on the NCI Cloud Initiative will be better outcomes for cancer patients. "By making this investment, we are making the data that we collect more useful, more widely available, and we will get a much larger group of people looking at it for clues to how we can improve the treatment of cancer,’’ he said. 

By creating a shared repository of data that is open to all researchers, the NCI Cloud Initiative represents a fundamental shift in cancer research.

"The mindset is really changing,’’ says Subha Madhavan, Director of the Innovation Center for Biomedical Informatics at Georgetown University Medical Center. "What we’ve been doing for the last 10 years is bring the data to the tools, which were on client/server architectures at universities and pharmaceutical research labs. But you can’t do that anymore because the majority of clinical projects produce data sets that are terabytes in scale.  So the mindset is changing to bring the tools and expertise to the data. It’s a shared computing model that’s emerging.’’ 

Prof. Jake Yue Chen, Associate Professor of Bioinformatics at Indiana University, says the NCI Cloud Initiative represents a significant advancement for biomedical research.

"This is very profound. It’s like putting genomic information that’s buried in the ivy towers of a few major research centers and putting it into the hands of every cancer researcher. The more people who can analyze this data, the more insights we are going to get into cancer,’’ Chen said. "I would imagine that orders of magnitude of knowledge would be generated because more researchers will be able to analyze the data.’’ 

Bioinformatics infrastructure as a service

The driver behind the NCI Cloud Initiative is the Cancer Genome Atlas, which will provide in-depth data on about 11,000 cancer patients, with an average of 500 gigabytes of data per patient. 

"We’re going to have the DNA sequence for their tumor and the matched normal control,’’ Komatsoulis explained. "There is RNA sequencing, medical images and clinical data. In addition, there is epigenetics, which are modifications to the DNA itself that impact the way various genes that exist in these patients get turned on and turned off. By September 2014, we expect to have generated, if not fully received, 2.5 petabytes of data. This is the shape of things to come.’’ 

The NCI cancer clouds will provide a complete bioinformatics infrastructure as a service, with built-in compute, storage, security and analytics. NCI has not defined the cloud-based infrastructure that it wants; instead, it is looking for innovative architectures that will meet the needs of the biomedical research community. 

"We really don’t know yet what is the best technology or what is the best way to structure the data so it can be computed on efficiently,’’ Komatsoulis says. "What we’re looking for is innovation. This is one of those cases where the government has the opportunity to enable the scientific community to innovate to solve an important problem.’’ 

NCI plans to fund three different architectures for the pilot project, with three years of funding for each team. The agencies is hoping that these architectures will scale, given that it expects data sets as large as 20 to 50 petabytes by 2019.

"The pilot projects will allow us to evaluate with a really big data set what is and what isn’t the most effective architecture for doing the kinds of analysis that scientists are interested in doing,’’ Komatsoulis said. "It’s our intention to test these clouds to make sure they meet our performance requirements, but also to throw them open to cancer researchers who can vote with their feet.’’ 

Challenges in lowering the barriers to entry

Komatsoulis said a key challenge is creating an efficient API that preserves the security and integrity of the data. Experts said the data is likely to be encrypted both in transit and at rest, and that authentication and access controls must be applied to it.

"We’re looking to lower the barrier to entry for scientists,’’ Komatsoulis said. "One of the purposes of having a solid API is that it gives us the opportunity to embed the security best practices in all of the programs.’’ 

The compute and storage capabilities required by the NCI Cloud Initiative are available on the market today, Chen said. But he added that it will be a challenge to integrate these technologies in a massive shared repository that can meet the needs of the biomedical research community. 

Another issue is standardization of data and frameworks for analyzing data, as there is variability in the terminology that cancer researchers use. 

"One challenge is usability and user friendliness,’’ Madhavan said. "It’s one thing for computer scientists and biomedical informatics specialists to be able to hack our way through the data, but we also need bench researchers and physicians to be consumers of this data. We’ve got to deliver the information through easy-to-use mobile health and Web-based platforms.’’ 

The NCI Cloud Initiative is part of a trend where biomedical research data is processed in public clouds. For example, the 1000 Genomes database is available via Amazon’s Elastic Compute Cloud. Similarly, Georgetown University Lombardi Cancer Center is using Amazon Web services for gene sequencing related to breast and colorectal cancers.

"Our IT team is small. It would take years for us to set up an infrastructure to manage terabytes of data. But in a matter of weeks, we can set up our data on Amazon’s cloud,’’ Madhavan said. "The cloud is a game changer for researchers like ours who want to do big data analysis.’’ 

Industry analysts said they expect to see more government-sponsored big data projects adopt a cloud infrastructure for compute, storage and analytics. 

"This sounds like a perfect example where cloud computing is a better arrangement given the bandwidth limits associated with downloading large data sets,’’ said Shawn McCarthy, research director at IDC Government Insights. "Putting data in a shared resource is becoming more popular because you can standardize the data. When everybody builds their own databases, you end up with different APIs and data name fields that are different. People spend more time normalizing the data than doing their analysis of it.’’ 

X
This website uses cookies to enhance user experience and to analyze performance and traffic on our website. We also share information about your use of our site with our social media, advertising and analytics partners. Learn More / Do Not Sell My Personal Information
Accept Cookies
X
Cookie Preferences Cookie List

Do Not Sell My Personal Information

When you visit our website, we store cookies on your browser to collect information. The information collected might relate to you, your preferences or your device, and is mostly used to make the site work as you expect it to and to provide a more personalized web experience. However, you can choose not to allow certain types of cookies, which may impact your experience of the site and the services we are able to offer. Click on the different category headings to find out more and change our default settings according to your preference. You cannot opt-out of our First Party Strictly Necessary Cookies as they are deployed in order to ensure the proper functioning of our website (such as prompting the cookie banner and remembering your settings, to log into your account, to redirect you when you log out, etc.). For more information about the First and Third Party Cookies used please follow this link.

Allow All Cookies

Manage Consent Preferences

Strictly Necessary Cookies - Always Active

We do not allow you to opt-out of our certain cookies, as they are necessary to ensure the proper functioning of our website (such as prompting our cookie banner and remembering your privacy choices) and/or to monitor site performance. These cookies are not used in a way that constitutes a “sale” of your data under the CCPA. You can set your browser to block or alert you about these cookies, but some parts of the site will not work as intended if you do so. You can usually find these settings in the Options or Preferences menu of your browser. Visit www.allaboutcookies.org to learn more.

Sale of Personal Data, Targeting & Social Media Cookies

Under the California Consumer Privacy Act, you have the right to opt-out of the sale of your personal information to third parties. These cookies collect information for analytics and to personalize your experience with targeted ads. You may exercise your right to opt out of the sale of personal information by using this toggle switch. If you opt out we will not be able to offer you personalised ads and will not hand over your personal information to any third parties. Additionally, you may contact our legal department for further clarification about your rights as a California consumer by using this Exercise My Rights link

If you have enabled privacy controls on your browser (such as a plugin), we have to take that as a valid request to opt-out. Therefore we would not be able to track your activity through the web. This may affect our ability to personalize ads according to your preferences.

Targeting cookies may be set through our site by our advertising partners. They may be used by those companies to build a profile of your interests and show you relevant adverts on other sites. They do not store directly personal information, but are based on uniquely identifying your browser and internet device. If you do not allow these cookies, you will experience less targeted advertising.

Social media cookies are set by a range of social media services that we have added to the site to enable you to share our content with your friends and networks. They are capable of tracking your browser across other sites and building up a profile of your interests. This may impact the content and messages you see on other websites you visit. If you do not allow these cookies you may not be able to use or see these sharing tools.

If you want to opt out of all of our lead reports and lists, please submit a privacy request at our Do Not Sell page.

Save Settings
Cookie Preferences Cookie List

Cookie List

A cookie is a small piece of data (text file) that a website – when visited by a user – asks your browser to store on your device in order to remember information about you, such as your language preference or login information. Those cookies are set by us and called first-party cookies. We also use third-party cookies – which are cookies from a domain different than the domain of the website you are visiting – for our advertising and marketing efforts. More specifically, we use cookies and other tracking technologies for the following purposes:

Strictly Necessary Cookies

We do not allow you to opt-out of our certain cookies, as they are necessary to ensure the proper functioning of our website (such as prompting our cookie banner and remembering your privacy choices) and/or to monitor site performance. These cookies are not used in a way that constitutes a “sale” of your data under the CCPA. You can set your browser to block or alert you about these cookies, but some parts of the site will not work as intended if you do so. You can usually find these settings in the Options or Preferences menu of your browser. Visit www.allaboutcookies.org to learn more.

Functional Cookies

We do not allow you to opt-out of our certain cookies, as they are necessary to ensure the proper functioning of our website (such as prompting our cookie banner and remembering your privacy choices) and/or to monitor site performance. These cookies are not used in a way that constitutes a “sale” of your data under the CCPA. You can set your browser to block or alert you about these cookies, but some parts of the site will not work as intended if you do so. You can usually find these settings in the Options or Preferences menu of your browser. Visit www.allaboutcookies.org to learn more.

Performance Cookies

We do not allow you to opt-out of our certain cookies, as they are necessary to ensure the proper functioning of our website (such as prompting our cookie banner and remembering your privacy choices) and/or to monitor site performance. These cookies are not used in a way that constitutes a “sale” of your data under the CCPA. You can set your browser to block or alert you about these cookies, but some parts of the site will not work as intended if you do so. You can usually find these settings in the Options or Preferences menu of your browser. Visit www.allaboutcookies.org to learn more.

Sale of Personal Data

We also use cookies to personalize your experience on our websites, including by determining the most relevant content and advertisements to show you, and to monitor site traffic and performance, so that we may improve our websites and your experience. You may opt out of our use of such cookies (and the associated “sale” of your Personal Information) by using this toggle switch. You will still see some advertising, regardless of your selection. Because we do not track you across different devices, browsers and GEMG properties, your selection will take effect only on this browser, this device and this website.

Social Media Cookies

We also use cookies to personalize your experience on our websites, including by determining the most relevant content and advertisements to show you, and to monitor site traffic and performance, so that we may improve our websites and your experience. You may opt out of our use of such cookies (and the associated “sale” of your Personal Information) by using this toggle switch. You will still see some advertising, regardless of your selection. Because we do not track you across different devices, browsers and GEMG properties, your selection will take effect only on this browser, this device and this website.

Targeting Cookies

We also use cookies to personalize your experience on our websites, including by determining the most relevant content and advertisements to show you, and to monitor site traffic and performance, so that we may improve our websites and your experience. You may opt out of our use of such cookies (and the associated “sale” of your Personal Information) by using this toggle switch. You will still see some advertising, regardless of your selection. Because we do not track you across different devices, browsers and GEMG properties, your selection will take effect only on this browser, this device and this website.