A novel confidence interval for a single proportion in the presence of clustered binary outcome data

Stat Methods Med Res. 2020 Jan;29(1):111-121. doi: 10.1177/0962280218823231. Epub 2019 Jan 23.

Abstract

Estimating the precision of a single proportion via a 100(1-α)% confidence interval in the presence of clustered data is an important statistical problem. It is necessary to account for possible over-dispersion, for instance, in animal-based teratology studies with within-litter correlation, epidemiological studies that involve clustered sampling, and clinical trial designs with multiple measurements per subject. Several asymptotic confidence interval methods have been developed, which have been found to have inadequate coverage of the true proportion for small-to-moderate sample sizes. In addition, many of the best-performing of these intervals have not been directly compared with regard to the operational characteristics of coverage probability and empirical length. This study uses Monte Carlo simulations to calculate coverage probabilities and empirical lengths of five existing confidence intervals for clustered data across various true correlations, true probabilities of interest, and sample sizes. In addition, we introduce a new score-based confidence interval method, which we find to have better coverage than existing intervals for small sample sizes under a wide range of scenarios.

Keywords: Clustered binary data; beta-binomial distribution; confidence interval; coverage; small sample.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Cluster Analysis
  • Confidence Intervals
  • Humans
  • Models, Statistical*
  • Monte Carlo Method
  • Prevalence
  • Probability
  • Pulmonary Disease, Chronic Obstructive / epidemiology
  • Sample Size
  • Siblings