What to Read Next
The United States has had many problems coping with the coronavirus. A critical — and underappreciated — problem is bad data, which makes coping more difficult.
We still don’t know how many people have the virus, how many are hospitalized, how many are in intensive care units, and how many are on ventilators. There is poor data on testing availability, and testing results are too often incorrect, delayed, or not counted. Contact tracing, necessary for avoiding community spread of coronavirus, lacks both the needed data and the human or technological resources to use it. And for much of the pandemic, we have not known whether medical supplies were adequate, whether equipment was even working, or how quickly we could obtain crucial items such as personal protective equipment and ventilators or ramp up to produce them domestically.
Email Updates on AI, Data, & Machine Learning
Get monthly email updates on how artificial intelligence and big data are affecting the development and execution of strategy in organizations.
Please enter a valid email address
Thank you for signing up
Without good data, planners can’t plan, epidemiologists can’t model, policy makers can’t make policy, and citizens don’t trust what they’re told. Bad data has led to poor decisions — behavioral and policy-oriented — which in turn have prolonged the disease and contributed to unneeded suffering and death.
Pandemics and other public health crises (such as opioid overdoses, AIDS, and SARS) occur frequently. The U.S. needs a robust program to develop and make available the trusted data needed to prevent, mitigate, and deal with them, through professional management of the data supply chain. We are losing the battle on COVID-19 data, so we must act quickly. We must put in place a system and a set of policies that can help fight future pandemics and public health crises.
Anatomy of a Data Disaster
The U.S. public health care system, like all industries, had many data quality problems before this pandemic. COVID-19 has brought these weaknesses into sharp relief. The Centers for Disease Control and Prevention (CDC) has played an effective role in fighting pandemics in the past, but it has focused less on a strong set of data policies and data quality standards. And the federated approach we’ve adopted to manage this pandemic, with each state choosing its own path to disease reporting and treatment, has been particularly unsuccessful.
Basic data on numbers of cases and “death due to coronavirus” are reported differently by different states. Some report presumed cases and deaths, and others do not. Some report on cases and deaths of nonresidents that occur in the state or in long-term care facilities, prisons, and business sites — and others do not. Some report on cases and deaths in all hospitals, whereas others rely on samples.
The net results are both a significant undercount and a hodgepodge of figures, making predictions and comparisons difficult. One is forced to conclude that the data needed to manage the COVID-19 pandemic is effectively unmanaged. This is an acute problem, demanding urgent, professional attention.
There are, of course, precursors to this story. The landmark report in 2000 by the Institute of Medicine (now the National Academy of Medicine), To Err is Human, gave its best estimate of the number of people who die yearly from preventable medical errors, primarily hospital-acquired infections, as between 44,000 and 98,000 — an incredibly wide range for something deemed a national priority for over 30 years. Likewise, there is widespread disagreement about the best way to define and measure hospital readmissions, and different states report maternal and infant mortality differently. Many of the same issues that have bedeviled COVID-19 data involving a lack of consensus around data definitions and uneven reporting processes are also found in public health data.
Many of the problems in pandemic and public health data arise from the front lines of health care provision — hospitals. Many hospitals still have data-related quality issues with wrong-site surgeries, medication errors, blood type errors, unrecorded allergies, misread radiology reports, and missing contact information. The relatively recent rise of electronic health record (EHR) systems, and federal standards for “meaningful use,” have addressed the problem somewhat. But data standards for the pandemic are not codified in EHRs, and data on the equipment needed to fight it isn’t in them at all.
Observers of this pandemic in the U.S. are not blind to data issues and take steps, often extraordinary, to deal with them. Finding official sources wanting, media companies such as The New York Times, The Atlantic, and National Geographic, and university groups at Johns Hopkins and the University of Washington, provide summaries of U.S. and global cases. Nonprofits have also jumped in to try to determine the needs of hospitals in hot spots and attempt to match critical suppliers to hospitals with the greatest needs.
While we applaud their efforts to fill the gaps, this shouldn’t be necessary, and it leads to multiple versions of pandemic truth, adding cost and uncertainty. It also builds a false sense of confidence — after all, The New York Times reports very specific death counts, camouflaging the uncertainty and severity of the issues, and distracting people from addressing the root issues.
COVID-19 Data Steps That Must Be Taken Now
So, what should the health care system do to address its data management issues? We’ve identified four steps that are crucial for our current moment:
Embrace having one agency consistently in charge of data collection. This pandemic underscores the need for a single, trusted national agency charged with setting federal standards for cases, deaths, testing rates, and other key data. This agency would provide complete, accurate data on the incidence of this disease, future pandemics, and other threats to public health.
Viruses and other diseases don’t respect state boundaries, and leaving states or even smaller administrative units to gather and report data however they wish will not work. Establishing a fully centralized approach and data management agency is, in fact, what other countries with much better records at controlling the pandemic have done: Singapore, South Korea, Ireland, and Japan, for example, use centralized data reporting agencies and have done better in managing the disease.
Recently, the federal government changed primary responsibility for the national collection of COVID-19 data from the CDC to its parent agency, the Department of Health and Human Services. A month after the announcement, the government said it would switch the responsibility back to the CDC, where it is creating a “revolutionary new data system” to manage such data. We tend to be skeptical of revolutionary breakthroughs in data systems, particularly when new systems were also touted as the reason for transferring the responsibility to HHS in the first place. We also question the wisdom of making the shift from one agency to another — and back again — in the middle (or so we hope) of a pandemic. It has led to delays and inaccuracies in data reporting for several weeks. A number of public health experts argued recently in a letter that changing to the new system left hospitals “scrambling to determine how to meet daily reporting requirements.”
We’re not sure which subagency within HHS is the best home for COVID-19 data management and reporting, but HHS is where this responsibility belongs overall. It is the federal agency charged with monitoring and improving health and health care. It also has primary responsibility for government-funded care reimbursement, which gives it a motivational “stick” to punish hospitals and public health departments that don’t supply data in the right formats.
Task whatever agency is managing the data with advocating for policy based on data. Creating an organizational home for COVID-19 data, and adopting some new technologies and service providers, are just the beginning of the solution.
The thrust of our recommendations is more about professional management and leadership of the data supply chain than about a new owner or a new database. Given current weaknesses in the supply chain, for instance, improving its function will require clear policies and perhaps hundreds of skilled professionals working with individual hospitals to make sure the hospitals understand them and develop the capabilities to meet them.
The agency managing COVID-19 data must create clear data definitions and centralize the maintenance of all data needed to guide public health policy and implementation. It must develop and manage the data supply chains such that all those contributing understand what is expected, and it must implement measurements and controls to make sure the data meets all requirements. It must build new capabilities to spot threats more quickly and engage with others to develop the data needed to fight them off. Data must be viewed as a national resource, protected by and subject to federal law restricting the release of medical information, but also anonymized and made available so others can analyze it. Doing this work will be incredibly demanding. But having the correct data is the only way that HHS can provide the needed leadership for this pandemic and for all elements of its overall mission.
Demand that agency leadership foster a data-driven culture. HHS or the CDC must build the needed culture starting at the very top. It must have a senior data leader, most likely a chief data officer. (HHS had one, but she resigned in January 2020 and has not yet been replaced; its CIO recently resigned as well. The CDC has not yet hired a chief data officer but is attempting to recruit one during the pandemic.) It must hire and train a large cadre of seasoned professionals, well versed in both data and health care, and mold them into a world-class team focused on making HHS a trusted, data-driven agency.
Politics is the elephant in the room, and it is clear that pandemic and other public health data must be made as immune as possible to political influence. The absence of strong federal standards means that state governments can report data that misrepresents their performance in addressing the disease. Florida, for example, fired a data dashboard designer who insisted upon reporting the correct positivity rate.
We have well-designed structures in place for resisting political influence on the reporting of unemployment data, and we can apply them to pandemic data as well. Politics also underscores the importance of extremely high-quality data: Throughout our careers, we’ve observed that people fight less about the data when those responsible are completely transparent about their methods, publish quality statistics, readily acknowledge weaknesses, and address those weaknesses aggressively.
Professionalize data management at providers of key pandemic and public health data as well. It’s clear that hospitals and public health systems will need to be fully engaged in the effort to improve the quality of data for COVID-19 and subsequent pandemics. This is a chronic problem that demands long-term, professional attention.
Hospitals, drug companies, labs, insurers, county health departments, and other data providers must professionalize their data management. They should put someone in charge of data, preferably reporting to the CEO. They must see themselves as essential to the data supply chain and learn to be better data customers and suppliers.
Everyone in the health care industry touches data and thus has a role to play in data quality, and hospitals and others must teach people these roles. In particular, those who spot bad data must be encouraged to do more than simply work around the issue — they must call it out, help sort out the root causes, and eliminate them. Finally, providers must get patients involved in building and maintaining complete and accurate records as well.
It’s impossible to know just how many lives and dollars that poor management of COVID-19 data has cost the U.S. — in part because we don’t have good data. But simply accepting the human and economic toll of future pandemics and other public health crises is untenable. Data can be our best weapon for fighting pandemics. We need better approaches to acquiring, managing, and using that data.