Gene Selection in a Single Cell Gene Space Based on D-S Evidence Theory

Interdiscip Sci. 2022 Sep;14(3):722-744. doi: 10.1007/s12539-022-00518-y. Epub 2022 Apr 28.

Abstract

If the samples, features and information values in a real-valued information system are cells, genes and gene expression values, respectively, then for convenience, this system is said to be a single cell gene space. In the era of big data, people are faced with high dimensional gene expression data with redundancy and noise causing its strong uncertainty. D-S evidence theory excels at tackling the problem of uncertainty, and its conditions to be met are weaker than Bayesian probability theory. Therefore, this paper studies the gene selection in a single cell gene space to remove noise and redundancy with D-S evidence theory. The distance between two cells in each gene is first defined. Then, the tolerance relation is established according to the defined distance. In addition, the belief and plausibility functions to grasp the uncertainty of a single cell gene space are introduced on the basis of the tolerance classes. Statistical analysis shows that they can effectively measure the uncertainty of a single cell gene space. Furthermore, several gene selection algorithms in a single cell gene space are presented using the proposed belief and plausibility. Finally, the performance of the proposed algorithm is compared to other algorithms on some published single-cell data sets. Experimental results and statistical tests show that the classification and clustering performance of the presented algorithm not only exceeds the other three state-of-the-art algorithms, but also its gene reduction rate is very high.

Keywords: Belief function; D–S evidence theory; Gene selection; Plausibility function; Single cell gene space; Tolerance relation.

MeSH terms

  • Algorithms*
  • Bayes Theorem
  • Cluster Analysis
  • Humans