Multiple imputation methods for handling incomplete longitudinal and clustered data where the target analysis is a linear mixed effects model

Md Hamidul Huque; Margarita Moreno-Betancur; Matteo Quartagno; Julie A Simpson; John B Carlin; Katherine J Lee

doi:10.1002/bimj.201900051

Multiple imputation methods for handling incomplete longitudinal and clustered data where the target analysis is a linear mixed effects model

Biom J. 2020 Mar;62(2):444-466. doi: 10.1002/bimj.201900051. Epub 2020 Jan 9.

Authors

Md Hamidul Huque^{1

2

3}, Margarita Moreno-Betancur^{1

2}, Matteo Quartagno⁴, Julie A Simpson⁵, John B Carlin^{1

2

5}, Katherine J Lee^{1

2}

Affiliations

¹ Murdoch Children's Research Institute, Parkville, Victoria, Australia.
² Department of Paediatrics, University of Melbourne, Parkville, Victoria, Australia.
³ University of New South Wales, Kensington, Sydney, Australia.
⁴ Institute for Clinical Trials and Methodology, University College London, London, United Kingdom.
⁵ Centre for Epidemiology and Biostatistics, Melbourne School of Population and Global Health, The University of Melbourne, Parkville, Victoria, Australia.

Abstract

Multiple imputation (MI) is increasingly popular for handling multivariate missing data. Two general approaches are available in standard computer packages: MI based on the posterior distribution of incomplete variables under a multivariate (joint) model, and fully conditional specification (FCS), which imputes missing values using univariate conditional distributions for each incomplete variable given all the others, cycling iteratively through the univariate imputation models. In the context of longitudinal or clustered data, it is not clear whether these approaches result in consistent estimates of regression coefficient and variance component parameters when the analysis model of interest is a linear mixed effects model (LMM) that includes both random intercepts and slopes with either covariates or both covariates and outcome contain missing information. In the current paper, we compared the performance of seven different MI methods for handling missing values in longitudinal and clustered data in the context of fitting LMMs with both random intercepts and slopes. We study the theoretical compatibility between specific imputation models fitted under each of these approaches and the LMM, and also conduct simulation studies in both the longitudinal and clustered data settings. Simulations were motivated by analyses of the association between body mass index (BMI) and quality of life (QoL) in the Longitudinal Study of Australian Children (LSAC). Our findings showed that the relative performance of MI methods vary according to whether the incomplete covariate has fixed or random effects and whether there is missingnesss in the outcome variable. We showed that compatible imputation and analysis models resulted in consistent estimation of both regression parameters and variance components via simulation. We illustrate our findings with the analysis of LSAC data.

Keywords: clustered data; fully conditional specification; joint modeling; missing data; multiple imputation; repeated measurement.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Biometry / methods*
Cluster Analysis
Linear Models
Longitudinal Studies

Grants and funding

MC_UU_12023/21/MRC_/Medical Research Council/United Kingdom