Phenotype prediction in plants is improved by integrating large-scale transcriptomic datasets

Zefeng Wu; Yali Sun; Xiaoqiang Zhao; Zigang Liu; Wenqi Zhou; Yining Niu

doi:10.1093/nargab/lqae184

Phenotype prediction in plants is improved by integrating large-scale transcriptomic datasets

NAR Genom Bioinform. 2024 Dec 27;6(4):lqae184. doi: 10.1093/nargab/lqae184. eCollection 2024 Dec.

Authors

Zefeng Wu¹, Yali Sun¹, Xiaoqiang Zhao¹, Zigang Liu¹, Wenqi Zhou², Yining Niu¹

Affiliations

¹ State Key Laboratory of Aridland Crop Science, Gansu Agricultural University, No. 1 Yingmen Village, Anning District, Lanzhou 730070, Gansu Province, China.
² Crop Research Institute, Gansu Academy of Agricultural Sciences, No. 1, New Village, Anning District, Lanzhou 730070, Gansu Province, China.

Abstract

Research on the dynamic expression of genes in plants is important for understanding different biological processes. We used the large amounts of transcriptomic data from various plant sample sources that are publicly available to investigate whether the expression levels of a subset of highly variable genes (HVGs) can be used to accurately identify the phenotypes of plants. Using maize (Zea mays L.) as an example, we built machine learning (ML) models to predict phenotypes using a gene expression dataset of 21 612 bulk RNA sequencing samples. We showed that the ML models achieved excellent prediction accuracy using only the HVGs to identify different phenotypes, including tissue types, developmental stages, cultivars and stress conditions. By ML models, several important functional genes were found to be associated with different phenotypes. We performed a similar analysis in rice (Orzya sativa L.) and found that the ML models could be generalized across species. However, the models trained from maize did not perform well in rice, probably because of the expression divergence of the conserved HVGs between the two species. Overall, our results provide an ML framework for phenotype prediction using gene expression profiles, which may contribute to precision management of crops in agricultural practices.

MeSH terms

Databases, Genetic
Gene Expression Profiling / methods
Gene Expression Regulation, Plant
Machine Learning*
Oryza* / genetics
Phenotype
Transcriptome* / genetics
Zea mays* / genetics