Convolutional Neural Networks for Segmentation of Pleural Mesothelioma: Analysis of Probability Map Thresholds (CALGB 30901, Alliance)

Mena Shenouda; Eyjólfur Gudmundsson; Feng Li; Christopher M Straus; Hedy L Kindler; Arkadiusz Z Dudek; Thomas Stinchcombe; Xiaofei Wang; Adam Starkey; Samuel G Armato Iii

doi:10.1007/s10278-024-01092-z

Convolutional Neural Networks for Segmentation of Pleural Mesothelioma: Analysis of Probability Map Thresholds (CALGB 30901, Alliance)

J Imaging Inform Med. 2024 Sep 12. doi: 10.1007/s10278-024-01092-z. Online ahead of print.

Authors

Affiliations

¹ Department of Radiology, The University of Chicago, Chicago, IL, 60637, USA.
² Icelandic Radiation Safety Authority, Reykjavik, Iceland.
³ Department of Medicine, The University of Chicago, Chicago, IL, 60637, USA.
⁴ Metro Minnesota Community Oncology Research Consortium, St. Louis Park, MN, 55416, USA.
⁵ Duke Cancer Institute, Duke University, Durkham, NC, 27710, USA.
⁶ Alliance Statistics and Data Management Center, Duke University, Durham, NC, 27710, USA.
⁷ Department of Radiology, The University of Chicago, Chicago, IL, 60637, USA. s-armato@uchicago.edu.

PMID: 39266911
DOI: 10.1007/s10278-024-01092-z

Abstract

The purpose of this study was to evaluate the impact of probability map threshold on pleural mesothelioma (PM) tumor delineations generated using a convolutional neural network (CNN). One hundred eighty-six CT scans from 48 PM patients were segmented by a VGG16/U-Net CNN. A radiologist modified the contours generated at a 0.5 probability threshold. Percent difference of tumor volume and overlap using the Dice Similarity Coefficient (DSC) were compared between the reference standard provided by the radiologist and CNN outputs for thresholds ranging from 0.001 to 0.9. CNN-derived contours consistently yielded smaller tumor volumes than radiologist contours. Reducing the probability threshold from 0.5 to 0.01 decreased the absolute percent volume difference, on average, from 42.93% to 26.60%. Median and mean DSC ranged from 0.57 to 0.59, with a peak at a threshold of 0.2; no distinct threshold was found for percent volume difference. The CNN exhibited deficiencies with specific disease presentations, such as severe pleural effusion or disease in the pleural fissure. No single output threshold in the CNN probability maps was optimal for both tumor volume and DSC. This study emphasized the importance of considering both figures of merit when evaluating deep learning-based tumor segmentations across probability thresholds. This work underscores the need to simultaneously assess tumor volume and spatial overlap when evaluating CNN performance. While automated segmentations may yield comparable tumor volumes to that of the reference standard, the spatial region delineated by the CNN at a specific threshold is equally important.

Abstract

Grants and funding