Detection of Severe Lung Infection on Chest Radiographs of COVID-19 Patients: Robustness of AI Models across Multi-Institutional Data

André Sobiecki; Lubomir M Hadjiiski; Heang-Ping Chan; Ravi K Samala; Chuan Zhou; Jadranka Stojanovska; Prachi P Agarwal

doi:10.3390/diagnostics14030341

Detection of Severe Lung Infection on Chest Radiographs of COVID-19 Patients: Robustness of AI Models across Multi-Institutional Data

Diagnostics (Basel). 2024 Feb 5;14(3):341. doi: 10.3390/diagnostics14030341.

Authors

André Sobiecki¹, Lubomir M Hadjiiski¹, Heang-Ping Chan¹, Ravi K Samala², Chuan Zhou¹, Jadranka Stojanovska³, Prachi P Agarwal¹

Affiliations

¹ Department of Radiology, University of Michigan, Ann Arbor, MI 48109, USA.
² Office of Science and Engineering Laboratories, Center for Devices and Radiological Health, U.S. Food and Drug Administration, Silver Spring, MD 20993, USA.
³ Department of Radiology, New York University, New York, NY 10016, USA.

Abstract

The diagnosis of severe COVID-19 lung infection is important because it carries a higher risk for the patient and requires prompt treatment with oxygen therapy and hospitalization while those with less severe lung infection often stay on observation. Also, severe infections are more likely to have long-standing residual changes in their lungs and may need follow-up imaging. We have developed deep learning neural network models for classifying severe vs. non-severe lung infections in COVID-19 patients on chest radiographs (CXR). A deep learning U-Net model was developed to segment the lungs. Inception-v1 and Inception-v4 models were trained for the classification of severe vs. non-severe COVID-19 infection. Four CXR datasets from multi-country and multi-institutional sources were used to develop and evaluate the models. The combined dataset consisted of 5748 cases and 6193 CXR images with physicians' severity ratings as reference standard. The area under the receiver operating characteristic curve (AUC) was used to evaluate model performance. We studied the reproducibility of classification performance using the different combinations of training and validation data sets. We also evaluated the generalizability of the trained deep learning models using both independent internal and external test sets. The Inception-v1 based models achieved AUC ranging between 0.81 ± 0.02 and 0.84 ± 0.0, while the Inception-v4 models achieved AUC in the range of 0.85 ± 0.06 and 0.89 ± 0.01, on the independent test sets, respectively. These results demonstrate the promise of using deep learning models in differentiating COVID-19 patients with severe from non-severe lung infection on chest radiographs.

Keywords: COVID-19; classification; deep learning; diagnosis; severe lung infection.

Grants and funding

A.S., H.-P.C., R.K.S., P.P.A., C.Z. and J.S. received no external funding. L.M.H. was partially funded through The Medical Imaging Data Resource Center (MIDRC), which is funded by the National Institute of Biomedical Imaging and Bioengineering (NIBIB) of the National Institutes of Health under Subcontract No. AWD101462-T (Contract No. 75N92020D00021).