This post, is a comment on the underlying research that is mentioned in a recent article in the Belgian newspaper De Standaard of 29/8, entitled “Amerikaanse rekenkunde voorspelt moeilijke schoolstart”.

Consider this article made available on the site of the University of California at San Francisco (based on research by pediatrician Dylan Chan MD): “To Stop COVID-19 Spread in Schools, Start with Local Data and Do the Math”. This article links to a shiny app that calculates the risk of at least one asymptomatic child in a class of a given size:

In this research report, the asymptomatic pediatric prevalence (represented as a percentage) is related to background 7-day incidence. Let us explain their calculation. Consider that incidence in a certain region or city or neighborhood is given.

$ Prev = Pediatric-asymptomatic-prevalence = 0.23 + 0.0107*incidence$

(On their web site, incidence is represented in terms of cases per week per 1000 population; we have modified it to the more commonly used incidence of cases per week per 100,000 population.) Example: City of Antwerp, incidence on August 26, 2020 is equal to 77 per week per 100,000. The derived prevalence of asymptomatic infections among children is then estimated, by these authors’ formula, as follows:

$Prev = (0.23 + 0.0107 * 77)% = 1.0539% = 0.010539$

Chances of at least one asymptomatic child in a class of size N:

$Aprob(N) = 1 – (1 – Prev)^N$


  • $Aprob(10) = 1 – (1 – 0.010539)^{10} = 10.1%$
  • $Aprob(20) = 1 – (1 – 0.010539)^{20} = 19.1%$
  • $Aprob(30) = 1 – (1 – 0.010539)^{30} = 27.2%$

Performing the same exercise for Belgium as a whole, leads to a 7-day incidence (based on the period between August 20, 2020 and August 26, 2020) of 26.63 cases per week per 100,000 individuals. The probability of at least one asymptomatic child in a class of size N = 20 reduces to 9.8%. For municipalities with a very low incidence, e.g., 10 cases per week per 100,000 individuals, we obtain for a class of size N = 20 a probability equal to 6.5%.

Note, however, that there is a difference between the time of calculation and the time of opening of the schools (September 1, 2020) which potentially has an impact on the results. On top of that, there is a delay due to the incubation period meaning that the 7-day incidence at a given moment in time only reflects transmission rates in the past and not at that moment in time. Taking this into account requires additional work in extending the aforementioned approach.

We can also reverse their formulas, where we specify a class size N, and a required (low) probability of having at least one child asymptomatically infected in the class, and then work out the incidence for the area that is tolerated:

$Incidence = 9324.30 – 9345.79 * (1-Aprob)^{(1/N)}$

Example: If a class is of size N=10, and Aprob=10% is tolerated, then we find that an incidence of 76 is tolerated. However, should the class size be N=20, then the incidence has to go down to 28. For a class of size N=20, and a tolerated Aprob=5%, the incidence goes down to 2.5. A limitation of their regression method is that it clearly does not perform well with very low tolerances(i.e., with very low tolerated Aprob). Arguably, more careful modeling would be needed, other than with linear regression. This adds to the limitations already specified on the above cited web page.

Summary of our criticisms. Observational data are used by these authors to link incidence and prevalence. The representativeness of their data needs to be examined very carefully. Also, the linear regression tool is not fitting very well, because very low incidences, combined with low required asymptomatic probabilities, may lead to negative numbers, which is impossible, and an undesirable side-effect of their model. A more refined, non-linear relationship needs to be derived, that automatically respects the non-negativity of incidence. The fit of the model should be carefully examined. Further, it is important to take age into account. The current tool is for the entire age range 0-18 years, which is too rough. Evidently, a well-fitting tool in one country does not automatically fit to the situation in another country; so it would have to be tailored to the local situation. Last, but not least, a calculation of this type should always be viewed for what it is: an estimation, surrounded with uncertainty. In that sense, even when models are well-fitting and reliable, every estimate should be accompanied by a quantification of uncertainty (standard error, 95% confidence interval).

Conclusion. A tool like this can be relevant, as an important component in the discussion of school reopening and the associated risk, but refinement is needed. It can add to the evidence base in the school reopening discussion, but should not be used as the sole decisive factor. It should be based on properly corrected and properly modeled local data.