Capturing cell heterogeneity in representations of cell populations for image-based profiling using contrastive learning

PLoS Comput Biol. 2024 Nov 11;20(11):e1012547. doi: 10.1371/journal.pcbi.1012547. eCollection 2024 Nov.

Abstract

Image-based cell profiling is a powerful tool that compares perturbed cell populations by measuring thousands of single-cell features and summarizing them into profiles. Typically a sample is represented by averaging across cells, but this fails to capture the heterogeneity within cell populations. We introduce CytoSummaryNet: a Deep Sets-based approach that improves mechanism of action prediction by 30-68% in mean average precision compared to average profiling on a public dataset. CytoSummaryNet uses self-supervised contrastive learning in a multiple-instance learning framework, providing an easier-to-apply method for aggregating single-cell feature data than previously published strategies. Interpretability analysis suggests that the model achieves this improvement by downweighting small mitotic cells or those with debris and prioritizing large uncrowded cells. The approach requires only perturbation labels for training, which are readily available in all cell profiling datasets. CytoSummaryNet offers a straightforward post-processing step for single-cell profiles that can significantly boost retrieval performance on image-based profiling datasets.

MeSH terms

  • Algorithms
  • Computational Biology* / methods
  • Humans
  • Image Processing, Computer-Assisted* / methods
  • Machine Learning
  • Single-Cell Analysis* / methods