How a computing powerhouse delivers health care insights
Connecting state and local government leaders
Researchers at Oak Ridge National Lab used three diverse high-performance computing architectures to analyze publicly available health-related datasets.
Health datasets come in many orders of magnitude, but few are as large as the public health big data being gathered and analyzed by computers at the Energy Department’s Oak Ridge National Lab.
About four years ago, the ORNL decided to amass as much public health care data as it could and subject it to the analytics engines of its most powerful computers.
“We were in a unique position with our leadership computing resources and data science expertise, and we saw an opportunity to use health data to discover data-driven insights for better health care quality, integrity and policy,” said Sreenivas Sukumar, a researcher in ORNL’s computational sciences division.
To analyze the datasets, ORNL researchers used its high-performance computing resources, including the multicore Titan, the second-most powerful computer in the world, Apollo, an in-memory Urika graph-computer built by Yarcdata, and distributed cloud computing-based machines.
The lab also tapped some of the biggest producers of health related data, including the Cancer Genome Atlas, clinicaltrials.gov, Semantic MEDLINE, openFDA, DocGraph and the National Plan and Provider Enumeration System.
In working with the data, the researchers initially encountered computing silos created by existing information architectures that did not scale to the analytics processing requirements of the large datasets. Consequently, the lab turned to an approach using graph computing, a scalable computing solution capable of uncovering relationships hidden in the data.
The graph computing almost immediately provided insights into some of the datasets, including providing feedback on understanding fraud, waste and abuse within the federal health care system, according to ORNL researchers.
In one case, the lab was able to identify a health care provider using multiple identities to bill patients. Another case showed guilt-by-association patterns that highlighted the potential for fraud before the provider began billing.
Georgia Tourassi, director of ORNL’s Health Data Sciences Institute (HDSI), said ORNL’s approach is novel in health care. Big data computing capabilities in facilities such as ORNL, “are critical to health care delivery,” Tourassi said. “It’s a paradigm shift in an environment that has always been reactive.”
HDSI is reaching out to partners who have different types of data and diverse needs for data analysis – such as genomics, electronic health records and health-sensor data. The projects will help collect, store, integrate and analyze data in support of next-generation personalized medicine care, said the researchers.
For example, ORNL is building the capability for clinical experts to “semantically reason” with medical records and associate health data types, such as claims and clinical records), while simulating the outcomes of different clinical interventions.
“We know for certain that health data will be getting bigger and more complex as the practice of medicine expands and progresses, said Tourassi. “By being involved and leveraging the investment, we can anticipate and prepare for the next bottleneck.”
NEXT STORY: Livermore recognized for shuttering data centers