scMAGIC: accurately annotating single cells using two rounds of reference-based classification

Nucleic Acids Res. 2022 May 6;50(8):e43. doi: 10.1093/nar/gkab1275.

Abstract

Here, we introduce scMAGIC (Single Cell annotation using MArker Genes Identification and two rounds of reference-based Classification [RBC]), a novel method that uses well-annotated single-cell RNA sequencing (scRNA-seq) data as the reference to assist in the classification of query scRNA-seq data. A key innovation in scMAGIC is the introduction of a second-round RBC in which those query cells whose cell identities are confidently validated in the first round are used as a new reference to again classify query cells, therefore eliminating the batch effects between the reference and the query data. scMAGIC significantly outperforms 13 competing RBC methods with their optimal parameter settings across 86 benchmark tests, especially when the cell types in the query dataset are not completely covered by the reference dataset and when there exist significant batch effects between the reference and the query datasets. Moreover, when no reference dataset is available, scMAGIC can annotate query cells with reasonably high accuracy by using an atlas dataset as the reference.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Exome Sequencing
  • Sequence Analysis, RNA / methods
  • Single-Cell Analysis* / methods