Dynamic models of gene expression are urgently required. In this paper, we describe the time evolution of gene expression by learning a jump diffusion process to model the biological process directly. Our algorithm needs aggregate gene expression data as input and outputs the parameters of the jump diffusion process. The learned jump diffusion process can predict population distributions of gene expression at any developmental stage, obtain long-time trajectories for individual cells, and offer a novel approach to computing RNA velocity. Moreover, it studies biological systems from a stochastic dynamic perspective. Gene expression data at a time point, which is a snapshot of a cellular process, is treated as an empirical marginal distribution of a stochastic process. The Wasserstein distance between the empirical distribution and predicted distribution by the jump diffusion process is minimized to learn the dynamics. For the learned jump diffusion process, its trajectories correspond to the development process of cells, the stochasticity determines the heterogeneity of cells, its instantaneous rate of state change can be taken as "RNA velocity", and the changes in scales and orientations of clusters can be noticed too. We demonstrate that our method can recover the underlying nonlinear dynamics better compared to previous parametric models and the diffusion processes driven by Brownian motion for both synthetic and real world datasets. Our method is also robust to perturbations of data because the computation involves only population expectations.
Keywords: Aggregate data; Gene dynamics; Stochastic modeling; Wasserstein distance.
Copyright © 2021 Elsevier Ltd. All rights reserved.