Machine learning analyses of methylation profiles uncovers tissue-specific gene expression patterns in wheat

Plant Genome. 2020 Jul;13(2):e20027. doi: 10.1002/tpg2.20027. Epub 2020 Jun 3.

Abstract

DNA methylation is a mechanism of epigenetic modification in eukaryotic organisms. Generally, methylation within genes promoter inhibits regulatory protein binding and represses transcription, whereas gene body methylation is associated with actively transcribed genes. However, it remains unclear whether there is interaction between methylation levels across genic regions and which site has the biggest impact on gene regulation. We investigated and used the methylation patterns of the bread wheat cultivar Chinese Spring to uncover differentially expressed genes (DEGs) between roots and leaves, using six machine learning algorithms and a deep neural network. As anticipated, genes with higher expression in leaves were mainly involved in photosynthesis and pigment biosynthesis processes whereas genes that were not differentially expressed between roots and leaves were involved in protein processes and membrane structures. Methylation occurred preponderantly (60%) in the CG context, whereas 35 and 5% of methylation occurred in CHG and CHH contexts, respectively. Methylation levels were highly correlated (r = 0.7 to 0.9) between all genic regions, except within the promoter (r = 0.4 to 0.5). Machine learning models gave a high (0.81) prediction accuracy of DEGs. There was a strong correlation (p-value = 9.20×10-10 ) between all features and gene expression, suggesting that methylation across all genic regions contribute to gene regulation. However, the methylation of the promoter, the CDS and the exon in CG context was the most impactful. Our study provides more insights into the interplay between DNA methylation and gene expression and paves the way for identifying tissue-specific genes using methylation profiles.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • DNA Methylation*
  • Epigenesis, Genetic
  • Machine Learning
  • Promoter Regions, Genetic
  • Triticum* / genetics