Data-driven nucleus subclassification on colon hematoxylin and eosin using style-transferred digital pathology

J Med Imaging (Bellingham). 2024 Nov;11(6):067501. doi: 10.1117/1.JMI.11.6.067501. Epub 2024 Nov 5.

Abstract

Purpose: Cells are building blocks for human physiology; consequently, understanding the way cells communicate, co-locate, and interrelate is essential to furthering our understanding of how the body functions in both health and disease. Hematoxylin and eosin (H&E) is the standard stain used in histological analysis of tissues in both clinical and research settings. Although H&E is ubiquitous and reveals tissue microanatomy, the classification and mapping of cell subtypes often require the use of specialized stains. The recent CoNIC Challenge focused on artificial intelligence classification of six types of cells on colon H&E but was unable to classify epithelial subtypes (progenitor, enteroendocrine, goblet), lymphocyte subtypes (B, helper T, cytotoxic T), and connective subtypes (fibroblasts). We propose to use inter-modality learning to label previously un-labelable cell types on H&E.

Approach: We took advantage of the cell classification information inherent in multiplexed immunofluorescence (MxIF) histology to create cell-level annotations for 14 subclasses. Then, we performed style transfer on the MxIF to synthesize realistic virtual H&E. We assessed the efficacy of a supervised learning scheme using the virtual H&E and 14 subclass labels. We evaluated our model on virtual H&E and real H&E.

Results: On virtual H&E, we were able to classify helper T cells and epithelial progenitors with positive predictive values of 0.34 ± 0.15 (prevalence 0.03 ± 0.01 ) and 0.47 ± 0.1 (prevalence 0.07 ± 0.02 ), respectively, when using ground truth centroid information. On real H&E, we needed to compute bounded metrics instead of direct metrics because our fine-grained virtual H&E predicted classes had to be matched to the closest available parent classes in the coarser labels from the real H&E dataset. For the real H&E, we could classify bounded metrics for the helper T cells and epithelial progenitors with upper bound positive predictive values of 0.43 ± 0.03 (parent class prevalence 0.21) and 0.94 ± 0.02 (parent class prevalence 0.49) when using ground truth centroid information.

Conclusions: This is the first work to provide cell type classification for helper T and epithelial progenitor nuclei on H&E.

Keywords: cell classification; domain shift; hematoxylin and eosin; multiplexed immunofluorescence; virtual hematoxylin and eosin; virtual staining.