America’s Entire Understanding of the Pandemic Was Shaped by Messy Data

iStock.com/claffra

 

Connecting state and local government leaders

COMMENTARY | Our view of this crisis has been blurry from the beginning.

This story was originally published by The Atlantic. Subscribe to the magazine’s newsletters.

To understand any data set, you have to understand the way its information is compiled. That’s especially true for a patchwork data set such as the one composed of U.S. COVID-19 data, which is the product of 56 smaller systems belonging to each state and territory in the country.

In our year of working with COVID-19 data, we harnessed our attention on these systems and found that much of the information they produced reflected their individual structures. This reality runs parallel to the country’s biggest public-health-data challenge: The data pipelines that so deeply affected the pandemic’s trajectory were not given the decades of support—financial and otherwise—needed to perform well under pressure. Instead, a novel threat arrived, and the data response we saw was fragmented, unstandardized, and limited by constraints of the reporting systems.

In this post, we’ll offer a summary of how states reported the five major COVID-19 metrics—tests, cases, deaths, hospitalizations, and recoveries—and a look at how reporting complexities shaped our understanding of the pandemic.

Tests

Before the COVID-19 pandemic, the CDC had never collected comprehensive national testing data for any infectious disease in the United States. But last March, as COVID-19 began to spread throughout the country, the number of tests conducted became the most crucial data point with which to understand the pandemic. Without it, we couldn’t understand whether or where low case counts were just an artifact of inadequate testing.

So, last April, the CDC partnered with the Association of Public Health Laboratories (APHL) to start the COVID-19 Electronic Laboratory Reporting Program (CELR), which would eventually collect detailed COVID-19 testing data from every state. While the federal government and APHL onboarded every state to CELR, which took just over a year, the COVID Tracking Project stepped in to compile a national testing count from state health-department websites. Like the CDC, states had never collected data at the scale the pandemic demanded, and as a result, all testing data were incomplete and unstandardized.

The pandemic exposed the extent to which the United States’ crucial but chronically underfunded laboratory-data infrastructure was at the mercy of the fax machine, with much manual data failing to make it into state counts or causing distortionary effects, such as data dumps. In addition, as nontraditional settings such as schools and nursing homes started administering antigen tests, states lost sight of how many of these COVID-19 tests had been conducted—opening a hole in our understanding of U.S. testing volume as antigen testing took off in the fall. Laboratories unaccustomed to collecting demographic data failed to collect information on the race and ethnicity of many people seeking COVID-19 testing, even though federal guidance required it.

[Read: The black hole in America’s COVID-19 data]

The way states reported testing information was dictated by these difficulties they faced in collecting it, and because each state had slightly different weak spots, reporting was unstandardized. Some states reported just electronically transmitted lab results, while others reported faxed data too. Some states reported antigen tests (or early on, antibody tests) combined with PCR-test data, some separated them out, and some states didn’t report them at all. Race and ethnicity data were highly incomplete and unstandardized, impeding efforts to understand the pandemic’s disproportionate effect on Black, Latino, and Indigenous communities.

Of all the inconsistencies across states, one extraordinarily daunting problem that did improve over the course of the pandemic was the variation in testing units. For much of the pandemic, some states chose (or had only the capability) to count the number of unique people tested rather than the number of tests conducted. Because individuals are likely to receive multiple tests for COVID-19 over time, states counting people rather than tests appeared to be doing much less testing than others, throwing off measures used to contextualize case counts, such as test positivity. By the end of our data collection, all but two jurisdictions had standardized counting tests rather than people—although there are still some variations within how states count tests.

Only the CDC ever stood a chance at collecting testing data that were standardized across jurisdictions. But the federal government has faced its own share of problems in putting together a national testing data set. When federal testing data were first published last May, many states still had not started submitting data to CELR, leading to a data set that was highly divergent from state data because it had different sourcing. And even now, with every state onboarded to CELR, many states show persistent data-quality issues in their federally published data, which have caused continued disparities with their state-published data.

Throughout the pandemic, both state and federal testing data were treated by health officials and politicians as having precision and comparability that they simply did not. State test positivity became the basis of travel ordinances and reopening decisions; federal test positivity was used to inform the federal response. Both came with scant acknowledgment of their respective data-quality problems, instead creating a din of conflicting information that damaged public trust.

Testing is also the base of the data pipeline for all the other metrics: Many people sought testing for COVID-19 without visiting a clinician, meaning state health departments had to rely on labs sending them test information, without the option of getting additional data from doctors. As a result, the weakness of testing pipelines ended up impeding the collection of all other COVID-19 metrics.

Cases

Cases are one of the few COVID-19 metrics for which the federal government has issued clear data standards, but the paths states took toward implementing and adhering to these standards varied greatly. These state-specific paths are important to study, because without a standardized way to define a COVID-19 case, making sound comparisons across states or producing a national summary was not always easy.

Testing sits at the heart of these case-identification problems. When PCR tests aren’t available—when manufacturing is delayed, when distribution lags, when access to testing sites is limited, and when incentives to seek testing are strained—it becomes crucial to establish another way to build a count. We know that in the first months of the pandemic, probable-case-identification gaps were especially profound. The CDC’s first probable-case definition was difficult for state health departments to work with in practice, because it depended on slow processes such as contact tracing. And states were slow to start publicly reporting probable cases. As a result, early probable-case counts severely underestimated the number of people likely to have COVID-19.

As states built up their testing programs, and especially as antigen tests began to be deployed as a tool for identifying probable COVID-19 cases, the data grew more and more able to capture a fuller picture of the pandemic. Still, challenges remain. Of the 56 U.S. states and territories we tracked, at least five still report confirmed case numbers only, without disclosing any information about probable cases; a handful more lump probable cases in with their confirmed case counts or don’t make case definitions clear.

What’s more, because the data-reporting pipelines needed to send antigen test results to state health officials are brand new, we know that huge numbers of positive antigen test results still never appear in state case counts, just as they never make it into test counts.

Deaths

Like many other countries, the U.S. ended up having two different death counts for COVID-19: the slower but more definitive count released by the CDC’s National Center for Health Statistics, and a more timely one compiled from state data.

At the start of the pandemic, the NCHS significantly sped up its process to release provisional death-certificate data on deaths due to COVID-19. However, because the provisional death-certificate data is charted by date of death, recent weeks display a significant taper effect that can be confusing without good documentation. And NCHS data, because it undergoes a federal review, has generally (but not always) moved slower than state counts.

For a more up-to-date picture of mortality, you can turn to state data, which the CDC scraped from state dashboards to assemble its own count of COVID-19 deaths. However, at the pandemic’s worst moments, there were still more people dying of COVID-19 than most states’ death-reporting infrastructures could handle. Not only did this problem lead to lags in the data; it also caused delays in issuance of death certificates, which sometimes blocked the relatives of those who had died from receiving health-care coverage or benefits.

[Read: How many Americans are about to die?]

The CDC did not issue any guidance about how states should track COVID-19 deaths, leading to a lack of standardization in how states defined the number. Some states counted deaths of individuals who had been identified as having a case of COVID-19, some states counted individuals whose death certificates listed COVID-19, and many used a combination of the two. Generally, states seemed to choose the method that allowed them to collate numbers most quickly within the constraints of their case surveillance and death infrastructures. And though it’s a common refrain that “deaths among cases” might overcount COVID-19 deaths, states using that method ended up, on average, undercounting NCHS death-certificate data by the same amount as states using death certificates.

Though these two methods ended up counting deaths at roughly the same speed and comprehensiveness, the federal government did not properly explain that states used different processes to count COVID-19 deaths. Instead, at different times, the CDC seemed conflicted about the definition of the count, saying in its data FAQ that state numbers represent deaths among cases identified according to the Council for State and Territorial Epidemiologists definition, and in a statement to us that the counts represent death-certificate data. And because states did not receive any guidance from the CDC on how to report deaths, not all states initially chose their counting methods with an eye toward speed. As a result, some had to switch to faster methods for counting deaths midway through the pandemic, causing significant confusion and sometimes public distrust when numbers abruptly changed.

Hospitalizations

As with other COVID-19 metrics, definitional differences hampered hospitalization-data reporting across the country. There was little standardization in how states reported current or cumulative patients, patients with confirmed or suspected cases, and pediatric cases. Many states didn’t readily define metrics on their websites, and many hospitals simply weren’t providing data.

In July, confusion grew when the Trump administration issued a sweeping order that fundamentally changed how COVID-19 hospitalization data were being compiled. In addition to reporting information to state health departments, hospitals across the country were suddenly directed to report COVID-19 numbers to the U.S. Department of Health and Human Services, which oversees the CDC, instead of reporting to the CDC directly.

At first, the switch was challenged, to say the least. (We wrote about the initial effects on the data here.) But as we watched hospitalization data closely over the second half of 2020, studying it to see how it tracked with numbers we were gathering from states themselves, we saw that the new protocol had patched the places where crucial data had been missing. In fact, current hospitalization data grew to be so reliably well reported—and federal data tracked with ours so closely—that the metric became a kind of lodestar in our understanding of the pandemic.

[Read: America’s most reliable pandemic data are now at risk]

Finally, in November, we decided to remove the “cumulative hospitalization” metric from our website. We knew that data from the early months of the pandemic were drastically incomplete, and we had watched as many states’ cumulative totals sat stagnant for weeks, while their current hospitalization numbers fluctuated. Additionally, 20 states never reported cumulative hospitalizations, making the national sum a large undercount. Ultimately, we decided that reporting the cumulative number of COVID-19 patients hospitalized was helpful in theory but less so in practice, and we tried to guide our data users toward more valuable metrics, such as current hospitalization and new hospital-admissions numbers, instead.

Recoveries

Our last of the five major metrics is one that sounds intrinsically hopeful and good, but in reality, it’s just as complicated as the others: recoveries.

Unfortunately, the recoveries metric shares many of the same challenges seen across COVID-19 data—it’s poorly defined, unstandardized, not reported in every state, and difficult to fully capture when case counts grow to scales that overwhelm state health departments.

What’s more, an additional layer of complexity looms over the recoveries metric, presenting a kind of philosophical dilemma. Scientists are still learning about the long-term health effects of COVID-19, even among asymptomatic cases. Declaring an individual “recovered” simply because they have avoided death can be misleading and insensitive.

For all these reasons, the COVID Tracking Project stopped reporting a national summary of recovery figures in November and decided to remove state-level recovery figures from our website in January. Instead of providing figures for recoveries, we began to track and display hospital discharges for the eight states providing those data, which had a clearer, more standardized meaning across states. As we wrote about state recovery metrics, our recommendation is that state health officials carefully consider how they discuss and quantify this information, choosing metrics such as “released from isolation” or “inactive cases” over labels that imply full recovery.

What We Have Learned, And What We Hope Happens Next

Over the past two months, a small crew at the COVID Tracking Project has been working to document our year of data collection, reflecting on how best to organize our project’s history so that journalists, policy makers, advocates, and the public might continue to find relevance in our work.

As we pored over our research on state reporting, we congealed our findings into a set of common reporting problems that made COVID-19 data especially difficult to aggregate on a national level. States tended to differ on how they defined data, what data they made available, and how they presented what data they did publish, making it difficult to compare data across states. All of those themes come through in the reporting arcs of these five COVID-19 metrics.

[Read: Why the pandemic experts failed]

Some of these problems could have been avoided with clearer reporting guidance from the federal government; others were inevitable, given the constraints of the United States’ underfunded public-health infrastructure. But all of them tended to be poorly documented, meaning it took a great deal of excavation to uncover the sources of these problems—or even the existence of the problems themselves.

These data challenges may have been readily apparent to or expected by those familiar with the contours of public-health informatics. But pandemics affect us all, and the infrastructure that responds to them is meant to protect us all, so we all deserve to understand how capable the infrastructure is. Frankly, we need to understand its limitations to navigate through a pandemic.

Above and beyond any individual reporting practice, we believe that it was the lack of explanations from state governments and, most crucially, the CDC that led to misuse of data and wounded public trust. We tried our best to provide explanations where possible, and we saw transformation when we were able to get the message across to the public. Data users who were frustrated or even doubtful came to trust the numbers. Journalists reported more accurately. Hospitals could better anticipate surges.

If we could make just one change to the way state and federal COVID-19 data were reported, it would be to make an open acknowledgment of the limitations of public-health-data infrastructure whenever the data is presented. And if we could make one plea for what comes next, it’s that these systems receive the investment they deserve.

This article has been adapted from its original version, which can be read in full at The COVID Tracking Project.

X
This website uses cookies to enhance user experience and to analyze performance and traffic on our website. We also share information about your use of our site with our social media, advertising and analytics partners. Learn More / Do Not Sell My Personal Information
Accept Cookies
X
Cookie Preferences Cookie List

Do Not Sell My Personal Information

When you visit our website, we store cookies on your browser to collect information. The information collected might relate to you, your preferences or your device, and is mostly used to make the site work as you expect it to and to provide a more personalized web experience. However, you can choose not to allow certain types of cookies, which may impact your experience of the site and the services we are able to offer. Click on the different category headings to find out more and change our default settings according to your preference. You cannot opt-out of our First Party Strictly Necessary Cookies as they are deployed in order to ensure the proper functioning of our website (such as prompting the cookie banner and remembering your settings, to log into your account, to redirect you when you log out, etc.). For more information about the First and Third Party Cookies used please follow this link.

Allow All Cookies

Manage Consent Preferences

Strictly Necessary Cookies - Always Active

We do not allow you to opt-out of our certain cookies, as they are necessary to ensure the proper functioning of our website (such as prompting our cookie banner and remembering your privacy choices) and/or to monitor site performance. These cookies are not used in a way that constitutes a “sale” of your data under the CCPA. You can set your browser to block or alert you about these cookies, but some parts of the site will not work as intended if you do so. You can usually find these settings in the Options or Preferences menu of your browser. Visit www.allaboutcookies.org to learn more.

Sale of Personal Data, Targeting & Social Media Cookies

Under the California Consumer Privacy Act, you have the right to opt-out of the sale of your personal information to third parties. These cookies collect information for analytics and to personalize your experience with targeted ads. You may exercise your right to opt out of the sale of personal information by using this toggle switch. If you opt out we will not be able to offer you personalised ads and will not hand over your personal information to any third parties. Additionally, you may contact our legal department for further clarification about your rights as a California consumer by using this Exercise My Rights link

If you have enabled privacy controls on your browser (such as a plugin), we have to take that as a valid request to opt-out. Therefore we would not be able to track your activity through the web. This may affect our ability to personalize ads according to your preferences.

Targeting cookies may be set through our site by our advertising partners. They may be used by those companies to build a profile of your interests and show you relevant adverts on other sites. They do not store directly personal information, but are based on uniquely identifying your browser and internet device. If you do not allow these cookies, you will experience less targeted advertising.

Social media cookies are set by a range of social media services that we have added to the site to enable you to share our content with your friends and networks. They are capable of tracking your browser across other sites and building up a profile of your interests. This may impact the content and messages you see on other websites you visit. If you do not allow these cookies you may not be able to use or see these sharing tools.

If you want to opt out of all of our lead reports and lists, please submit a privacy request at our Do Not Sell page.

Save Settings
Cookie Preferences Cookie List

Cookie List

A cookie is a small piece of data (text file) that a website – when visited by a user – asks your browser to store on your device in order to remember information about you, such as your language preference or login information. Those cookies are set by us and called first-party cookies. We also use third-party cookies – which are cookies from a domain different than the domain of the website you are visiting – for our advertising and marketing efforts. More specifically, we use cookies and other tracking technologies for the following purposes:

Strictly Necessary Cookies

We do not allow you to opt-out of our certain cookies, as they are necessary to ensure the proper functioning of our website (such as prompting our cookie banner and remembering your privacy choices) and/or to monitor site performance. These cookies are not used in a way that constitutes a “sale” of your data under the CCPA. You can set your browser to block or alert you about these cookies, but some parts of the site will not work as intended if you do so. You can usually find these settings in the Options or Preferences menu of your browser. Visit www.allaboutcookies.org to learn more.

Functional Cookies

We do not allow you to opt-out of our certain cookies, as they are necessary to ensure the proper functioning of our website (such as prompting our cookie banner and remembering your privacy choices) and/or to monitor site performance. These cookies are not used in a way that constitutes a “sale” of your data under the CCPA. You can set your browser to block or alert you about these cookies, but some parts of the site will not work as intended if you do so. You can usually find these settings in the Options or Preferences menu of your browser. Visit www.allaboutcookies.org to learn more.

Performance Cookies

We do not allow you to opt-out of our certain cookies, as they are necessary to ensure the proper functioning of our website (such as prompting our cookie banner and remembering your privacy choices) and/or to monitor site performance. These cookies are not used in a way that constitutes a “sale” of your data under the CCPA. You can set your browser to block or alert you about these cookies, but some parts of the site will not work as intended if you do so. You can usually find these settings in the Options or Preferences menu of your browser. Visit www.allaboutcookies.org to learn more.

Sale of Personal Data

We also use cookies to personalize your experience on our websites, including by determining the most relevant content and advertisements to show you, and to monitor site traffic and performance, so that we may improve our websites and your experience. You may opt out of our use of such cookies (and the associated “sale” of your Personal Information) by using this toggle switch. You will still see some advertising, regardless of your selection. Because we do not track you across different devices, browsers and GEMG properties, your selection will take effect only on this browser, this device and this website.

Social Media Cookies

We also use cookies to personalize your experience on our websites, including by determining the most relevant content and advertisements to show you, and to monitor site traffic and performance, so that we may improve our websites and your experience. You may opt out of our use of such cookies (and the associated “sale” of your Personal Information) by using this toggle switch. You will still see some advertising, regardless of your selection. Because we do not track you across different devices, browsers and GEMG properties, your selection will take effect only on this browser, this device and this website.

Targeting Cookies

We also use cookies to personalize your experience on our websites, including by determining the most relevant content and advertisements to show you, and to monitor site traffic and performance, so that we may improve our websites and your experience. You may opt out of our use of such cookies (and the associated “sale” of your Personal Information) by using this toggle switch. You will still see some advertising, regardless of your selection. Because we do not track you across different devices, browsers and GEMG properties, your selection will take effect only on this browser, this device and this website.