Knowledge database assisted gene marker selection for chronic lymphocytic leukemia

J Int Med Res. 2018 Aug;46(8):3358-3364. doi: 10.1177/0300060518783072. Epub 2018 Jul 12.

Abstract

Objective To investigate whether previously curated chronic lymphocytic leukemia (CLL) risk genes could be leveraged in gene marker selection for the diagnosis and prediction of CLL. Methods A CLL genetic database (CLL_042017) was developed through a comprehensive CLL-gene relation data analysis, in which 753 CLL target genes were curated. Expression values for these genes were used for case-control classification of four CLL datasets, with a sparse representation-based variable selection (SRVS) approach employed for feature (gene) selection. Results were compared with outcomes obtained by using analysis of variance (ANOVA)-based gene selection approaches. Results For each of the four datasets, SRVS selected a subset of genes from the 753 CLL target genes, resulting in significantly higher classification accuracy, compared with randomly selected genes (100%, 100%, 93.94%, 89.39%). The SRVS method outperformed ANOVA in terms of classification accuracy. Conclusion Gene markers selected from the 753 CLL genes could enable significantly greater accuracy in the prediction of CLL. SRVS provides an effective method for gene marker selection.

Keywords: Chronic lymphocytic leukemia (CLL); case-control classification; disease prediction; gene markers; genetic databases; sparse representation; variable selection.

MeSH terms

  • Case-Control Studies
  • Databases, Genetic*
  • Genetic Markers*
  • Genetic Testing
  • Humans
  • Leukemia, Lymphocytic, Chronic, B-Cell / diagnosis*
  • Leukemia, Lymphocytic, Chronic, B-Cell / genetics*

Substances

  • Genetic Markers