Multiple imputation for an incomplete covariate that is a ratio

Stat Med. 2014 Jan 15;33(1):88-104. doi: 10.1002/sim.5935. Epub 2013 Aug 6.

Abstract

We are concerned with multiple imputation of the ratio of two variables, which is to be used as a covariate in a regression analysis. If the numerator and denominator are not missing simultaneously, it seems sensible to make use of the observed variable in the imputation model. One such strategy is to impute missing values for the numerator and denominator, or the log-transformed numerator and denominator, and then calculate the ratio of interest; we call this 'passive' imputation. Alternatively, missing ratio values might be imputed directly, with or without the numerator and/or the denominator in the imputation model; we call this 'active' imputation. In two motivating datasets, one involving body mass index as a covariate and the other involving the ratio of total to high-density lipoprotein cholesterol, we assess the sensitivity of results to the choice of imputation model and, as an alternative, explore fully Bayesian joint models for the outcome and incomplete ratio. Fully Bayesian approaches using Winbugs were unusable in both datasets because of computational problems. In our first dataset, multiple imputation results are similar regardless of the imputation model; in the second, results are sensitive to the choice of imputation model. Sensitivity depends strongly on the coefficient of variation of the ratio's denominator. A simulation study demonstrates that passive imputation without transformation is risky because it can lead to downward bias when the coefficient of variation of the ratio's denominator is larger than about 0.1. Active imputation or passive imputation after log-transformation is preferable.

Keywords: compatibility; missing data; multiple imputation; ratios.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Bayes Theorem*
  • Body Mass Index
  • CD4 Lymphocyte Count
  • Cholesterol / blood
  • Cohort Studies
  • Computer Simulation
  • Female
  • HIV Infections / blood
  • HIV Infections / drug therapy
  • Hemoglobins / analysis
  • Humans
  • Male
  • Models, Statistical*
  • Neoplasms / metabolism
  • Regression Analysis*
  • South Africa

Substances

  • Hemoglobins
  • Cholesterol