Model selection for incomplete and design-based samples

Stat Med. 2006 Jul 30;25(14):2502-20. doi: 10.1002/sim.2559.

Abstract

The Akaike information criterion, AIC, is one of the most frequently used methods to select one or a few good, optimal regression models from a set of candidate models. In case the sample is incomplete, the naive use of this criterion on the so-called complete cases can lead to the selection of poor or inappropriate models. A similar problem occurs when a sample based on a design with unequal selection probabilities, is treated as a simple random sample. In this paper, we consider a modification of AIC, based on reweighing the sample in analogy with the weighted Horvitz-Thompson estimates. It is shown that this weighted AIC-criterion provides better model choices for both incomplete and design-based samples. The use of the weighted AIC-criterion is illustrated on data from the Belgian Health Interview Survey, which motivated this research. Simulations show its performance in a variety of settings.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Adult
  • Belgium / epidemiology
  • Computer Simulation
  • Decision Support Techniques*
  • Female
  • Humans
  • Likelihood Functions
  • Mass Screening
  • Middle Aged
  • Regression Analysis*
  • Research Design*
  • Uterine Cervical Neoplasms / epidemiology
  • Uterine Cervical Neoplasms / prevention & control
  • Vaginal Smears