Multiple Imputation for Longitudinal Data: A Tutorial

Stat Med. 2025 Feb 10;44(3-4):e10274. doi: 10.1002/sim.10274.

Abstract

Longitudinal studies are frequently used in medical research and involve collecting repeated measures on individuals over time. Observations from the same individual are invariably correlated and thus an analytic approach that accounts for this clustering by individual is required. While almost all research suffers from missing data, this can be particularly problematic in longitudinal studies as participation often becomes harder to maintain over time. Multiple imputation (MI) is widely used to handle missing data in such studies. When using MI, it is important that the imputation model is compatible with the proposed analysis model. In a longitudinal analysis, this implies that the clustering considered in the analysis model should be reflected in the imputation process. Several MI approaches have been proposed to impute incomplete longitudinal data, such as treating repeated measurements of the same variable as distinct variables or using generalized linear mixed imputation models. However, the uptake of these methods has been limited, as they require additional data manipulation and use of advanced imputation procedures. In this tutorial, we review the available MI approaches that can be used for handling incomplete longitudinal data, including where individuals are clustered within higher-level clusters. We illustrate implementation with replicable R and Stata code using a case study from the Childhood to Adolescence Transition Study.

Keywords: clustered data; fully conditional specification; joint modeling; longitudinal data; missing data; multiple imputation.

MeSH terms

  • Adolescent
  • Child
  • Cluster Analysis
  • Data Interpretation, Statistical
  • Humans
  • Linear Models
  • Longitudinal Studies
  • Models, Statistical*