Artificial-cell-type aware cell-type classification in CITE-seq

Bioinformatics. 2020 Jul 1;36(Suppl_1):i542-i550. doi: 10.1093/bioinformatics/btaa467.

Abstract

Motivation: Cellular Indexing of Transcriptomes and Epitopes by sequencing (CITE-seq), couples the measurement of surface marker proteins with simultaneous sequencing of mRNA at single cell level, which brings accurate cell surface phenotyping to single-cell transcriptomics. Unfortunately, multiplets in CITE-seq datasets create artificial cell types (ACT) and complicate the automation of cell surface phenotyping.

Results: We propose CITE-sort, an artificial-cell-type aware surface marker clustering method for CITE-seq. CITE-sort is aware of and is robust to multiplet-induced ACT. We benchmarked CITE-sort with real and simulated CITE-seq datasets and compared CITE-sort against canonical clustering methods. We show that CITE-sort produces the best clustering performance across the board. CITE-sort not only accurately identifies real biological cell types (BCT) but also consistently and reliably separates multiplet-induced artificial-cell-type droplet clusters from real BCT droplet clusters. In addition, CITE-sort organizes its clustering process with a binary tree, which facilitates easy interpretation and verification of its clustering result and simplifies cell-type annotation with domain knowledge in CITE-seq.

Availability and implementation: http://github.com/QiuyuLian/CITE-sort.

Supplementary information: Supplementary data is available at Bioinformatics online.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Cluster Analysis
  • Epitopes
  • Gene Expression Profiling*
  • Sequence Analysis, RNA
  • Single-Cell Analysis*
  • Software

Substances

  • Epitopes