Analyzing scRNA-seq data by CCP-assisted UMAP and tSNE

Yuta Hozumi; Guo-Wei Wei

doi:10.1371/journal.pone.0311791

Analyzing scRNA-seq data by CCP-assisted UMAP and tSNE

PLoS One. 2024 Dec 13;19(12):e0311791. doi: 10.1371/journal.pone.0311791. eCollection 2024.

Authors

Yuta Hozumi¹, Guo-Wei Wei^{1

2

3}

Affiliations

¹ Department of Mathematics, Michigan State University, East Lansing, Michigan, United States of America.
² Department of Electrical and Computer Engineering, Michigan State University, East Lansing, Michigan, United States of America.
³ Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, Michigan, United States of America.

Abstract

Single-cell RNA sequencing (scRNA-seq) is widely used to reveal heterogeneity in cells, which has given us insights into cell-cell communication, cell differentiation, and differential gene expression. However, analyzing scRNA-seq data is a challenge due to sparsity and the large number of genes involved. Therefore, dimensionality reduction and feature selection are important for removing spurious signals and enhancing downstream analysis. Correlated clustering and projection (CCP) was recently introduced as an effective method for preprocessing scRNA-seq data. CCP utilizes gene-gene correlations to partition the genes and, based on the partition, employs cell-cell interactions to obtain super-genes. Because CCP is a data-domain approach that does not require matrix diagonalization, it can be used in many downstream machine learning tasks. In this work, we utilize CCP as an initialization tool for uniform manifold approximation and projection (UMAP) and t-distributed stochastic neighbor embedding (tSNE). By using 21 publicly available datasets, we have found that CCP significantly improves UMAP and tSNE visualization and dramatically improve their accuracy. More specifically, CCP improves UMAP by 22% in ARI, 14% in NMI and 15% in ECM, and improves tSNE by 11% in ARI, 9% in NMI and 8% in ECM.

Copyright: © 2024 Hozumi, Wei. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

MeSH terms

Algorithms*
Cluster Analysis
Computational Biology / methods
Gene Expression Profiling / methods
Humans
Machine Learning
RNA-Seq / methods
Sequence Analysis, RNA / methods
Single-Cell Analysis* / methods
Single-Cell Gene Expression Analysis
Software

Abstract

MeSH terms

Grants and funding