A novel confidence interval for a single proportion in the presence of clustered binary outcome data

Meghan I Short; Howard J Cabral; Janice M Weinberg; Michael P LaValley; Joseph M Massaro

doi:10.1177/0962280218823231

A novel confidence interval for a single proportion in the presence of clustered binary outcome data

Stat Methods Med Res. 2020 Jan;29(1):111-121. doi: 10.1177/0962280218823231. Epub 2019 Jan 23.

Authors

Meghan I Short¹, Howard J Cabral¹, Janice M Weinberg¹, Michael P LaValley¹, Joseph M Massaro¹

Affiliation

¹ Department of Biostatistics, Boston University School of Public Health, Boston, MA, USA.

PMID: 30672389
DOI: 10.1177/0962280218823231

Abstract

Estimating the precision of a single proportion via a 100(1-α)% confidence interval in the presence of clustered data is an important statistical problem. It is necessary to account for possible over-dispersion, for instance, in animal-based teratology studies with within-litter correlation, epidemiological studies that involve clustered sampling, and clinical trial designs with multiple measurements per subject. Several asymptotic confidence interval methods have been developed, which have been found to have inadequate coverage of the true proportion for small-to-moderate sample sizes. In addition, many of the best-performing of these intervals have not been directly compared with regard to the operational characteristics of coverage probability and empirical length. This study uses Monte Carlo simulations to calculate coverage probabilities and empirical lengths of five existing confidence intervals for clustered data across various true correlations, true probabilities of interest, and sample sizes. In addition, we introduce a new score-based confidence interval method, which we find to have better coverage than existing intervals for small sample sizes under a wide range of scenarios.

Keywords: Clustered binary data; beta-binomial distribution; confidence interval; coverage; small sample.

Publication types

Research Support, N.I.H., Extramural

MeSH terms

Cluster Analysis
Confidence Intervals
Humans
Models, Statistical*
Monte Carlo Method
Prevalence
Probability
Pulmonary Disease, Chronic Obstructive / epidemiology
Sample Size
Siblings