COVID cloud data fuels virus studies
Connecting state and local government leaders
Data from the National COVID Cohort Collaborative, a cloud-based repository of COVID-19 data, is fueling studies about the virus.
Data from the National COVID Cohort Collaborative (N3C), a cloud-based repository of COVID-19 data built last year, is fueling studies about the virus.
For instance, a on predicting COVID severity, published July 13, accessed data through the N3C Data Enclave, a secure platform that stores harmonized data that contributing members provide. It allows for three tiers of data access based on the scope and nature of the research in addition to approval by N3C’s Data Access Committee.
The recent study used data on 174,568 adults with COVID-19 and determined that machine learning (ML) models could accurately predict how sick patients would become based on clinical data commonly collected during the first 24 hours of a hospital admission.
Developed in 2020 by a group of researchers funded by the National Institutes of Health to facilitate COVID-related information sharing, N3C collects medical records from patients nationwide. As of July 15, the data involved 6.5 million people, about 2.2 million COVID-positive cases, 3.5 billion lab results and 1.2 medication records.
To protect privacy, this data consists only of limited data sets, deidentified data sets, and synthetic data sets; privacy is ensured through the use of data anonymization through a separate contract with an honest data broker.
The Palantir’s Foundry platform-as-a-service environment resides in Amazon Web Services GovCloud and is authorized at a moderate impact level by the Federal Risk and Authorization Management Program, NIH officials said in the program’s FAQ.
“The N3C has unique features that distinguish it from other COVID-19 data resources,” according to the research paper, making it amenable to ML. “First, it harmonizes data from a very large number of clinical sites (86 had signed data transfer agreements as of March 30, 2021), which is important because significant site-level variation in critical metrics, such as invasive ventilatory support and mortality, has been reported.”
Second, as a centralized repository, N3C ensures that its data is robust, representative and of high-quality across sites. That’s important because many reports on clinical details, treatments and outcomes come from one hospital or health care system in one geographic region.
Third, efforts to collect data early in the crisis may not have been designed to support future research, the report stated. By contrast, N3C “provides transparent, easily shared, versioned, and fully auditable data and analytic provenance.”
The study concluded that “N3C is a nationally representative, transparent, reproducible, harmonized data resource that enables effective and efficient collaborative observational COVID-19 research.”
The 200-plus other N3C-powered projects under way include assessing COVID re-infection risks, using ML to predict its impact on pregnant women and racial disparity studies.
Under the supervision of NIH’s National Center for Advancing Translational Sciences, the enclave has data from more than 70 sites nationwide and is one of the largest collections of clinical data related to COVID-19 symptoms and outcomes in the United States.
When N3C launched, NIH said the only identifying data on the platform will include the ZIP code of the health care group providing the information and the dates of service. What’s more, data access is open to all approved users, regardless of whether they contribute, though they must analyze data within the cloud platform; it cannot be removed or downloaded.
Other government COVID research efforts include the COVID-19 Insights Partnership, which was announced July 28, 2020, that aims to create a framework for the departments of Health and Human Services and Veterans Affairs to use high-performance computing and artificial intelligence resources at the Energy Department. That expanded on the COVID-19 High Performance Computing Consortium, which brings “together the federal government, industry and academic leaders to provide access to the world’s most powerful high-performance computing resources in support of COVID-19 research.” Additionally, VA added a COVID-19 questionnaire to participants in its Million Veteran Program to collect information about their experience with the virus.