Machine learning mathematical models for incidence estimation during pandemics

PLoS Comput Biol. 2024 Dec 23;20(12):e1012687. doi: 10.1371/journal.pcbi.1012687. eCollection 2024 Dec.

Abstract

Accurate estimates of the incidence of infectious diseases are key for the control of epidemics. However, healthcare systems are often unable to test the population exhaustively, especially when asymptomatic and paucisymptomatic cases are widespread; this leads to significant and systematic under-reporting of the real incidence. Here, we propose a machine learning approach to estimate the incidence of a pandemic in real-time, using reported cases and the overall test rate. In particular, we use Bayesian symbolic regression to automatically learn the closed-form mathematical models that most parsimoniously describe incidence. We develop and validate our models using COVID-19 incidence values for nine different countries, confirming their ability to accurately predict daily incidence. Remarkably, despite the differences in epidemic trajectories and dynamics across countries, we find that a single model for all countries offers a more parsimonious description and is more predictive of actual incidence compared to separate models for each country. Our results show the potential to accurately model incidence in real-time using closed-form mathematical models, providing a valuable tool for public health decision-makers.

MeSH terms

  • Bayes Theorem*
  • COVID-19* / epidemiology
  • Computational Biology / methods
  • Humans
  • Incidence
  • Machine Learning*
  • Models, Theoretical
  • Pandemics* / statistics & numerical data
  • SARS-CoV-2*

Grants and funding

This research was supported by projects PID2022-142600NB-I00 (M.SP. and R.G.) and PID2021-128005NB-C21 (A.A., C.G. and S.G) from MCIN/AEI/10.13039/501100011033; by project 2021SGR-633 from the Government of Catalonia (all authors); and by project 2023PFR-URV-00633 from Universitat Rovira i Virgili (all authors). A.A. GB, SG and CG acknowledge support from project CREXDATA no. 101092749 from the European Union’s Horizon Europe Programme, and from project no.\ 220020325 from the James S. McDonnell Foundation. AA also acknowledges the Joint Appointment Program at Pacific Northwest National Laboratory (PNNL). PNNL is a multi-program national laboratory operated for the U.S. Department of Energy (DOE) by Battelle Memorial Institute under Contract No. DE-AC05-76RL01830. This project has also received funding from the European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement no. 945413 (M.M.). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.