Machine learning unravels inherent structural patterns in Escherichia coli Hi-C matrices and predicts chromosome dynamics

Nucleic Acids Res. 2024 Oct 14;52(18):10836-10849. doi: 10.1093/nar/gkae749.

Abstract

High dimensional nature of the chromosomal conformation contact map ('Hi-C Map'), even for microscopically small bacterial cell, poses challenges for extracting meaningful information related to its complex organization. Here we first demonstrate that an artificial deep neural network-based machine-learnt (ML) low-dimensional representation of a recently reported Hi-C interaction map of archetypal bacteria Escherichia coli can decode crucial underlying structural pattern. The ML-derived representation of Hi-C map can automatically detect a set of spatially distinct domains across E. coli genome, sharing reminiscences of six putative macro-domains previously posited via recombination assay. Subsequently, a ML-generated model assimilates the intricate relationship between large array of Hi-C-derived chromosomal contact probabilities and respective diffusive dynamics of each individual chromosomal gene and identifies an optimal number of functionally important chromosomal contact-pairs that are majorly responsible for heterogenous, coordinate-dependent sub-diffusive motions of chromosomal loci. Finally, the ML models, trained on wild-type E. coli show-cased its predictive capabilities on mutant bacterial strains, shedding light on the structural and dynamic nuances of ΔMatP30MM and ΔMukBEF22MM chromosomes. Overall our results illuminate the power of ML techniques in unraveling the complex relationship between structure and dynamics of bacterial chromosomal loci, promising meaningful connections between ML-derived insights and biological phenomena.

MeSH terms

  • Chromosomes, Bacterial* / chemistry
  • Chromosomes, Bacterial* / genetics
  • Escherichia coli Proteins / chemistry
  • Escherichia coli Proteins / genetics
  • Escherichia coli Proteins / metabolism
  • Escherichia coli* / genetics
  • Genome, Bacterial
  • Machine Learning

Substances

  • Escherichia coli Proteins