Prediction of Time-Series Transcriptomic Gene Expression Based on Long Short-Term Memory with Empirical Mode Decomposition

Int J Mol Sci. 2022 Jul 7;23(14):7532. doi: 10.3390/ijms23147532.

Abstract

RNA degradation can significantly affect the results of gene expression profiling, with subsequent analysis failing to faithfully represent the initial gene expression level. It is urgent to have an artificial intelligence approach to better utilize the limited data to obtain meaningful and reliable analysis results in the case of data with missing destination time. In this study, we propose a method based on the signal decomposition technique and deep learning, named Multi-LSTM. It is divided into two main modules: One decomposes the collected gene expression data by an empirical mode decomposition (EMD) algorithm to obtain a series of sub-modules with different frequencies to improve data stability and reduce modeling complexity. The other is based on long short-term memory (LSTM) as the core predictor, aiming to deeply explore the temporal nonlinear relationships embedded in the sub-modules. Finally, the prediction results of sub-modules are reconstructed to obtain the final prediction results of time-series transcriptomic gene expression. The results show that EMD can efficiently reduce the nonlinearity of the original data, which provides reliable theoretical support to reduce the complexity and improve the robustness of LSTM models. Overall, the decomposition-combination prediction framework can effectively predict gene expression levels at unknown time points.

Keywords: empirical mode decomposition; gene expression; intrinsic mode functions; long short-term memory; time-series.

MeSH terms

  • Algorithms
  • Artificial Intelligence
  • Memory, Short-Term*
  • Time Factors
  • Transcriptome*