GeM-LR: Discovering predictive biomarkers for small datasets in vaccine studies

PLoS Comput Biol. 2024 Nov 14;20(11):e1012581. doi: 10.1371/journal.pcbi.1012581. eCollection 2024 Nov.

Abstract

Despite significant progress in vaccine research, the level of protection provided by vaccination can vary significantly across individuals. As a result, understanding immunologic variation across individuals in response to vaccination is important for developing next-generation efficacious vaccines. Accurate outcome prediction and identification of predictive biomarkers would represent a significant step towards this goal. Moreover, in early phase vaccine clinical trials, small datasets are prevalent, raising the need and challenge of building a robust and explainable prediction model that can reveal heterogeneity in small datasets. We propose a new model named Generative Mixture of Logistic Regression (GeM-LR), which combines characteristics of both a generative and a discriminative model. In addition, we propose a set of model selection strategies to enhance the robustness and interpretability of the model. GeM-LR extends a linear classifier to a non-linear classifier without losing interpretability and empowers the notion of predictive clustering for characterizing data heterogeneity in connection with the outcome variable. We demonstrate the strengths and utility of GeM-LR by applying it to data from several studies. GeM-LR achieves better prediction results than other popular methods while providing interpretations at different levels.

MeSH terms

  • Algorithms
  • Biomarkers*
  • Computational Biology* / methods
  • Humans
  • Logistic Models
  • Vaccination / statistics & numerical data
  • Vaccines* / immunology

Substances

  • Biomarkers
  • Vaccines