Estimating the precision of a single proportion via a 100(1-α)% confidence interval in the presence of clustered data is an important statistical problem. It is necessary to account for possible over-dispersion, for instance, in animal-based teratology studies with within-litter correlation, epidemiological studies that involve clustered sampling, and clinical trial designs with multiple measurements per subject. Several asymptotic confidence interval methods have been developed, which have been found to have inadequate coverage of the true proportion for small-to-moderate sample sizes. In addition, many of the best-performing of these intervals have not been directly compared with regard to the operational characteristics of coverage probability and empirical length. This study uses Monte Carlo simulations to calculate coverage probabilities and empirical lengths of five existing confidence intervals for clustered data across various true correlations, true probabilities of interest, and sample sizes. In addition, we introduce a new score-based confidence interval method, which we find to have better coverage than existing intervals for small sample sizes under a wide range of scenarios.
Keywords: Clustered binary data; beta-binomial distribution; confidence interval; coverage; small sample.