Riemannian geometry and statistical modeling correct for batch effects and control false discoveries in single-cell surface protein count data

Phys Rev E. 2020 Jul;102(1-1):012409. doi: 10.1103/PhysRevE.102.012409.

Abstract

Recent advances in next generation sequencing-based single-cell technologies have allowed high-throughput quantitative detection of cell-surface proteins along with the transcriptome in individual cells, extending our understanding of the heterogeneity of cell populations in diverse tissues that are in different diseased states or under different experimental conditions. Count data of surface proteins from the cellular indexing of transcriptomes and epitopes by sequencing (CITE-seq) technology pose new computational challenges, and there is currently a dearth of rigorous mathematical tools for analyzing the data. This work utilizes concepts and ideas from Riemannian geometry to remove batch effects between samples and develops a statistical framework for distinguishing positive signals from background noise. The strengths of these approaches are demonstrated on two independent CITE-seq data sets in mouse and human.

MeSH terms

  • Animals
  • False Positive Reactions
  • Gene Expression Profiling
  • Humans
  • Membrane Proteins / genetics
  • Membrane Proteins / metabolism*
  • Mice
  • Models, Biological*
  • Single-Cell Analysis*

Substances

  • Membrane Proteins