This is a difficult question we’ve spent most of the last year trying to answer.

There have been many questions we have been desperate to answer during this pandemic.

How fatal is COVID-19? What drugs can we use to treat it? If you’re working from home, is there really a reason to own more than one pair of formal trousers?

But one really important question that has been really hard to answer since the beginning of the pandemic sounds surprisingly simple: how many people have had COVID-19?

The simple answer of about 140 million, based on confirmed cases, is also quite obviously wrong. We’ve known since the beginning of the pandemic that only using confirmed cases, which are cases that are reported officially, usually using PCR test counts, gives us a substantial undercount of the true number of people who have been infected. There are a variety of reasons for this, including testing capacity and who goes to get a test — asymptomatic people usually don’t.

This is a problem, because we do really need to know how many people have been infected. It’s a useful number for determining statistics like the infection-fatality rate (IFR), which is a subject that I’ve published a few papers on, but it’s also important for monitoring the epidemic locally and looking at things like the herd immunity threshold.

So, how do we know how many people have been infected?

## Models and Tests

There are two primary ways that scientists have tried to calculated the total number of infected people in an area — using a mathematical model with some assumptions, and testing a lot of people to see if they have antibodies to the disease.

The mathematical models range from the very simple to the fiendishly complex, but ultimately they rely on their assumptions. So, for example, we have the US CDC’s estimate which put the number of infections at about 80 million in the US as of mid-January, based on extrapolating from the number of hospitalizations and deaths back to a rough estimate of the true number of infections. A similar but slightly more complex model for the US is at COVID projections by Youyang Gu.

These models are very useful, but they also have their drawbacks. Every model is only as good as its assumptions, and there are a lot of uncertainties in the mix. If our estimate of the IFR of COVID-19 is off, for example, then we may be totally wrong about how many people have been infected.

And so, we do a type of study called a seroprevalence survey. This is basically a survey that takes blood samples to see how many people in a given population have antibodies to COVID-19. We have now done hundreds of these studies, which allows us to have a good look at how many people have probably been infected in various places across the world.

However, seroprevalence studies have issues as well. It is quite difficult to do a high-quality survey, because there are many ways that they can be biased. If your sample is of a selected population, for example, then it probably isn’t going to be generalizable to the population. This is pretty simple epidemiology — you can’t just assume that, say, the infection rate in a single clinic where you took some blood samples is going to reflect the entirety of a massive city.

This is one form of an issue known as selection bias, and it’s not a small one. A recent study showed that even in a controlled setting, selection bias can more than double your estimate of the number of people who have been infected. It’s entirely possible that a study with a high degree of bias could produce an estimate many times higher (or, in some circumstances, lower) than the true number of infections.

In fact, a recent systematic review and meta-analysis showed something quite startling —out of 404 seroprevalence surveys that have been done across the world, only 82 of them were of medium or high quality and sufficiently good to include in an analysis. On top of this, only a small number of those high-quality studies were actually conducted in the general population, making it even harder to infer a true infection rate.

Worse still, these studies are very concentrated. There are dozens of pieces of research from Europe and the United States, but fewer studies that have been done in Africa. This is a global equity issue, but it is also bad if we want to know how many people have been infected with COVID-19. There are some really excellent local studies in some high-income places — we have a really good idea, for example, of the infection rates in the United Kingdom — but not nearly as many for at least half of the world’s population.

## The Question

All of this brings us back to the central question — how many people have been infected with SARS-CoV-2?

The short answer is that we really don’t know. The evidence we’ve got is perhaps sufficient to exclude truly vast numbers of infections — it’s unlikely in the extreme that billions of people have had the disease — but doesn’t really tell us much more than that.

The systematic review I was talking about earlier, which is the most up-to-date estimate of total global infections currently, estimated that somewhere around 8% of the globe had antibodies to SARS-CoV-2 as of December 2020. That means very roughly 6–7 hundred million people infected by the end of last year.

But this was by no means a solid figure. While they concluded that “the majority of examined populations have not been infected”, the main finding of the paper was that better estimates were urgently needed worldwide to get at the true number of infections.

In reality, we don’t have a good handle on the true number of people who have had COVID-19, even a year into the pandemic. We have some quite good estimates for certain places, either those derived from mathematical models or those made using high-quality seroprevalence surveys, but we don’t really know, worldwide, how many infections they have been. A plausible range of infections as of the end of 2020 may have been 6–700 million, but given the massive outbreaks of the last few months, that number has increased enormously.

There are some places where we have really quite good estimates of the total number of infections. In the UK, the Office for National Statistics has been regularly generating really impressive figures that are extremely trustworthy. But for the rest of the world it’s really hard to know exactly how many people have been infected so far.

Global infection rates are just another place where there is a massive amount of uncertainty. We have an idea of what the figure might be, and can probably exclude some of the highest and lowest estimates (it’s not going to be 100 million or 3 billion, in all likelihood), but within that range there’s a great deal of uncertainty around what the number really is.

So how many people have had COVID?

We don’t really know. Probably quite a few.

*Gideon Meyerowitz-Katz is an epidemiologist who tweets @GidMK*

*This piece was originally published at medium.com*