Objective: Given a predictive model for identifying very likely benign breast lesions on the basis of Breast Imaging Reporting and Data System (BI-RADS) mammographic findings, this study evaluated the model's ability to generalize to a patient data set from a different institution.
Materials and methods: The artificial neural network model underwent three trials: it was optimized over 500 biopsy-proven lesions from Duke University Medical Center or "Duke," evaluated on 1,000 similar cases from the University of Pennsylvania Health System or "Penn," and reoptimized for Penn.
Results: Trial A's Duke-only model yielded 98% sensitivity, 36% specificity, area index (A(z)) of 0.86, and partial A(z) of 0.51. The cross-institutional trial B yielded 96% sensitivity, 28% specificity, A(z) of 0.79, and partial A(z) of 0.28. The decreases were significant for both A(z) (p = 0.017) and partial A(z) (p < 0.001). In trial C, the model reoptimized for the Penn data yielded 96% sensitivity, 35% specificity, A(z) of 0.83, and partial A(z) of 0.32. There were no significant differences compared with trial B for specificity (p = 0.44) or partial A(z) (p = 0.46), suggesting that the Penn data were inherently more difficult to characterize.
Conclusion: The BI-RADS lexicon facilitated the cross-institutional test of a breast cancer prediction model. The model generalized reasonably well, but there were significant performance decreases. The cross-institutional performance was encouraging because it was not significantly different from that of a reoptimized model using the second data set at high sensitivities. This study indicates the need for further work to collect more data and to improve the robustness of the model.