VoxEU Column COVID-19

How widespread is coronavirus in New York? We need to know

29 Mar 2020

Coronavirus is already widespread but the true number of cases in the population may be as much as 50 or 100 times the number currently testing positive. This column argues that we need accurate population-level numbers if we want to make reasonable policy decisions. It suggests that a random sample of 5,000 people is large enough to pin down the prevalence of COVID-19 infection, and will improve our understanding of transmission and of how prevention measures are working.

David Canning

Richard Saltonstall Professor of Population Sciences Harvard T.H. Chan School of Public Health

David Bloom

Clarence James Gamble Professor of Economics and Demography Harvard University

As of 29 March, over 52,000 people had tested positive for SARS-CoV-2, the virus that causes COVID-19, in New York State. But how many are really infected? We don’t know.

Only those who have symptoms or are otherwise considered at high risk due to contact with individuals confirmed to be infected are likely to be tested. Moreover, many apparently symptomatic individuals have been unable to get tested, and even more people may be infected but remain asymptomatic, fail to recognise their symptoms, or fail take them seriously.

We urgently need to know the actual numbers of symptomatic and asymptomatic infections in the community if we want to accurately assess the epidemic and make the right policy decisions.

The true number of cases in the population must be higher than the number tested positive so far, but we don’t know by how much. Very conservative estimates would be that actual case totals are 2 or 3 times larger than the number of positive tests. Epidemiological modelers at Harvard and John Hopkins have suggested that the true figure might be as much as 10 or 20 times higher than has been reported. And in the UK, recent estimates based on symptoms rather than testing suggest that the real ratio of cases to positive tests might be even higher, perhaps as much as 50- or 100-fold.

How can we determine the actual number? In an ideal world, we would test everyone and test them repeatedly. But constraints on the number of available test kits, other essential medical supplies, laboratory capacity, and personnel render such an approach impossible at this time.

A feasible, alternative approach is to test a representative random sample of 5,000 people in New York State. Based on our statistical calculations, this sample size is large enough to pin down the prevalence of COVID-19 infection accurately within a margin of error of 1.5 percentage points. Employing this strategy would be akin to using exit polls to predict the outcome of an election – which has been done with decent success historically.

In addition to testing, we should take people’s temperatures and ask about symptoms; we should do this for both the individuals in the sample and others in their households. We should also ask about any precautionary measures the sampled individuals are taking, including social distancing, so we can better understand who is getting infected and improve our transmission models and prevention advice.

To implement such a sampling strategy, a number of elements and precautions must be in place. First, we have to ensure the safety of the testers, as well as those they visit, by provisioning the testers with adequate protective gear. Second, we need people to consent to being tested; we can appeal to their public responsibility, but also offer prompt medical treatment for those who test positive.

Finally, there is the issue of securing enough test kits and the accompanying supplies, all of which remain scarce. With testing still being rationed among the acutely ill, setting aside kits for community surveillance may seem like a secondary or even tertiary concern. To be sure, those who are ill and need a diagnosis should remain a priority, but we also need to know how many are infected in the community and potentially spreading the disease. As more test kits become available, setting aside a portion to achieve a basic level of community surveillance is likely to pay tremendous dividends. In addition, we would benefit greatly from knowing how many have recovered from infection and are now potentially immune, but this cannot be achieved until a high-quality serological antibody test is developed and produced at scale.

A key question at the moment is when can the state move to less-restrictive social distancing measures focused primarily on isolating those with active infection as well as quarantining their known contacts? The high number of confirmed cases, considered along with reports of hospitals already feeling significant strain, strongly suggests that the current rate of infection is too high to transition to this type of case-control method at the moment. Still, there is reason to hope that we can pursue a more targeted strategy once present measures have succeeded in resetting the epidemic curve. But we can only do so safely if we have confidence in our ability to accurately assess the prevalence of infection and its trajectory.

Our capacity for prevention and control of COVID-19 infection will be greatly enhanced by better surveillance data. Fortunately, such data are well within reach, and constitute an offer we must not refuse.