scCAD: Cluster decomposition-based anomaly detection for rare cell identification in single-cell expression data

Nat Commun. 2024 Aug 31;15(1):7561. doi: 10.1038/s41467-024-51891-9.

Abstract

Single-cell RNA sequencing (scRNA-seq) technologies have become essential tools for characterizing cellular landscapes within complex tissues. Large-scale single-cell transcriptomics holds great potential for identifying rare cell types critical to the pathogenesis of diseases and biological processes. Existing methods for identifying rare cell types often rely on one-time clustering using partial or global gene expression. However, these rare cell types may be overlooked during the clustering phase, posing challenges for their accurate identification. In this paper, we propose a Cluster decomposition-based Anomaly Detection method (scCAD), which iteratively decomposes clusters based on the most differential signals in each cluster to effectively separate rare cell types and achieve accurate identification. We benchmark scCAD on 25 real-world scRNA-seq datasets, demonstrating its superior performance compared to 10 state-of-the-art methods. In-depth case studies across diverse datasets, including mouse airway, brain, intestine, human pancreas, immunology data, and clear cell renal cell carcinoma, showcase scCAD's efficiency in identifying rare cell types in complex biological scenarios. Furthermore, scCAD can correct the annotation of rare cell types and identify immune cell subtypes associated with disease, thereby offering valuable insights into disease progression.

MeSH terms

  • Algorithms
  • Animals
  • Cluster Analysis
  • Computational Biology / methods
  • Gene Expression Profiling / methods
  • Humans
  • Mice
  • Pancreas / cytology
  • Pancreas / metabolism
  • Pancreas / pathology
  • RNA-Seq / methods
  • Sequence Analysis, RNA / methods
  • Single-Cell Analysis* / methods
  • Transcriptome