Rationale and objectives: Traditionally, multireader receiver operating characteristic (ROC) studies have used a "paired-case, paired-reader" design. The statistical power of such a design for inferences about the relative accuracies of the tests was assessed and compared with alternative designs.
Methods: The noncentrality parameter of an F statistic was used to compute power as a function of the reader and patient sample sizes and the variability and correlation between readings.
Results: For a fixed-power and Type I error rate, the traditional design reduces the number of verified cases required. A hybrid design, in which each reader interprets a different sample of patients, reduces the number of readers, total readings, and reading required per reader. The drawback is a substantial increase in the number of verified cases.
Conclusion: The ultimate choice of study design depends on the nature of the tests being compared, limiting resources, a priori knowledge of the magnitude of the correlations and variability and logistic complexity.