Purpose: To evaluate the robustness of a breast ultrasonographic (US) computer-aided diagnosis (CAD) system in terms of its performance across different patient populations.
Materials and methods: Three US databases were analyzed for this study: one South Korean and two United States databases. All three databases were utilized in an institutional review board-approved and HIPAA-compliant manner. Round-robin analysis and independent testing were performed to evaluate the performance of a computerized breast cancer classification scheme across the databases. Receiver operating characteristic (ROC) analysis was used to evaluate performance differences.
Results: The round-robin analyses of each database demonstrated similar results, with areas under the ROC curve ranging from 0.88 (95% confidence interval [CI]: 0.820, 0.918) to 0.91 (95% CI: 0.86, 0.95). The independent testing of each database, however, indicated that although the performances were similar, the range in areas under the ROC curve (from 0.79 [95% CI: 0.730, 0.842] to 0.87 [95% CI: 0.794, 0.923]) was wider than that with the round-robin tests. However, the only instances in which statistically significant differences in performance were demonstrated occurred when the Korean database was used in a testing capacity in independent testing.
Conclusion: The few observed statistically significant differences in performance indicated that while the US features used by the system were useful across the databases, their relative importance differed. In practice, this means that a CAD system may need to be adjusted when applied to a different population.