*The following is the opinion and analysis of the writer:*

A mathematical formula can help determine the size of an epidemic. I hope I can show something of its charm, as well as its limitations.

Think of a researcher whose task is to find the proportion of residents that is infected with a certain disease.

One state has a population of 500,000. Take a random sample of 1,000 inhabitants. This is only a tiny part of the population, and the proportion in the sample who get the disease may differ from the proportion in the population. It could be, for instance, that the proportion in the population as a whole is 41%, while the proportion in a particular sample is 39%.

A mathematical calculation gives a number that represents the largest error that one is likely to encounter with a sample size of 1,000. This number is quite small, roughly 3%.

Sampling on a modest and inexpensive scale works to give a reasonable picture of the extent of the disease.

Another state has a population of 40 million. Again take a random sample of 1,000 inhabitants, an even tinier part of the population. The mathematical calculation again gives an error number. Here is a surprise: The number is the same, in the neighborhood of 3%. The size of the sample compared with the size of the population is unimportant. What is important is the sample size.

The mathematical formula that is used is not complicated. If *n* is the number in the sample, compute 1/*n* (one divided by *n*). Then use your calculator to take the square root. For example, if *n* is 1000, then 1/*n* is 0.001, and the square root of this is 0.0316, which is about 3%.

The formula is conservative: If the population proportion is close to zero or one, the errors tend to be smaller than the amount given by the formula. For a population proportion that is not close to zero or one, the formula gives essentially the right number.

Another example: If the sample size is 100, the formula gives the square root of 1/100, which is 1/10 or 10%. Say the (unknown) proportion in the population is 41%. It would be quite normal for the researcher to find a sample proportion of 47%. Another sample might have only 32%. Such estimates are rather far from the true 41%. A sample size of 100 is too small to make more accurate projections.

The formula describes what happens with a random sample. An ideal random sampling procedure would list everyone in the population, use a random number generator to select a sample from the list, then locate and test each person from this sample. This ideal may be hard to achieve, but even imperfect sampling can give a useful picture of the real situation. Furthermore, sampling with sample sizes comparable to 1,000 is relatively inexpensive.

Information from a random sample is valuable, but it does not solve our current problem.

Beyond calibrating the magnitude of the problem, we need to identify and isolate individuals who carry the disease. It takes time for symptoms to appear, and people with the disease can spread it to many others before they are located.

So we must repeatedly test most of the population. This is much more expensive than sampling, and it might well exhaust the budget of an individual state.

The resources to do this are available on the federal level. A large-scale coordinated effort could suppress the epidemic and let us move on to more normal lives.

William Faris was a professor of mathematics at the University of Arizona from 1974 to 2011. Between 2014 and 2019, he spent three semesters as visiting professor at NYU Shanghai. He lives in Tucson.

# Concerned about COVID-19?

Sign up now to get the most recent coronavirus headlines and other important local and national news sent to your email inbox daily.