United States COVID-19 cases may be substantially underestimated

The United States may have experienced more than 6.4 million cases of COVID-19 by April 18, 2020, according to a probability analysis conducted by UC Berkeley School of Public Health researchers and published in Nature Communications. That is nine time more than the number of confirmed cases in the same period, which was 721,245.

“The majority of this discrepancy was due to undertesting,” says Jade Benjamin-Chung, study co-author and professor of epidemiology & biostatistics at Berkeley Public Health. “Our findings highlight the urgent need for more SARS-CoV-2 [the virus that causes COVID-19] testing—including testing of asymptomatic individuals exposed to COVID-19—to help stop the spread of the virus.”

Jade Benjamin-Chung PhD, MPH

The first known case of COVID-19 in the US was confirmed on January 21, 2020. For the first few months of the pandemic, the US Centers for Disease Control recommended that testing be prioritized for patients in the hospital who presented moderate or severe symptoms. However, studies suggest that 30–70% of individuals who test positive for the virus present with mild symptoms or may have none at all.

“We know that in the US, earlier on in the epidemic, the people who were getting tested had moderate to severe symptoms,” Benjamin-Chung said. “And we know that since then, we have a larger number of asymptomatic people who are affecting the total number of infections but may not be included in confirmed case counts.”

Berkeley Public Health’s Benjamin-Chung, Sean Wu, and colleagues estimated the total number of SARS-CoV-2 infections in each US state from February 28 to April 18, 2020, using a probabilistic bias analysis to account for incomplete testing and less than 100% test accuracy. The authors estimate that there were 6,454,951 cases of SARS-CoV-2 infection (19 per 1,000 people). This estimate is about 9 times larger than the number of confirmed cases during the same period (2 per 1,000 people) and suggests that 89% of infections were undocumented. The majority of this difference (approximately 86%) was due to incomplete testing, with the remainder due to limited test accuracy.

The authors found that COVID-19 incidence was highest in the Northeast, Midwest, and the state of Louisiana when using confirmed case counts or the estimated number of infections. Underestimation of the number of cases was more common in Puerto Rico, California, and some southern states. In 33 states, the estimated number of infections was at least 10 times higher than the number of confirmed cases.

The authors note that their methodology does not incorporate a transmission model and so they are unable to make forecasts about the spread of the virus. However, they argue that their method provides a more realistic picture of infection burden at a given point in time.

“The method that we used can’t forecast the number of cases in the future. For future forecasts, we would need to use a transmission model,” said co-author and Berkeley Public Health PhD candidate Wu. “The number of cases we estimated in April could be used as a baseline for a transmission model that started then and would predict what would happen in the future.”


Related Information

Substantial underestimation of SARS-CoV-2 infection in the United States
Nature Communications

How much do COVID-19 case counts underestimate the size of the pandemic?
Nature Microbiology Community


Substack subscription form sign up