Integrating unsupervised language model with multi-view multiple sequence alignments for high-accuracy inter-chain contact prediction

Comput Biol Med. 2023 Nov:166:107529. doi: 10.1016/j.compbiomed.2023.107529. Epub 2023 Sep 20.

Abstract

Accurate identification of inter-chain contacts in the protein complex is critical to determine the corresponding 3D structures and understand the biological functions. We proposed a new deep learning method, ICCPred, to deduce the inter-chain contacts from the amino acid sequences of the protein complex. This pipeline was built on the designed deep residual network architecture, integrating the pre-trained language model with three multiple sequence alignments (MSAs) from different biological views. Experimental results on 709 non-redundant benchmarking protein complexes showed that the proposed ICCPred significantly increased inter-chain contact prediction accuracy compared to the state-of-the-art approaches. Detailed data analyses showed that the significant advantage of ICCPred lies in the utilization of pre-trained transformer language models which can effectively extract the complementary co-evolution diversity from three MSAs. Meanwhile, the designed deep residual network enhances the correlation between the co-evolution diversity and the patterns of inter-chain contacts. These results demonstrated a new avenue for high-accuracy deep-learning inter-chain contact prediction that is applicable to large-scale protein-protein interaction annotations from sequence alone.

Keywords: Co-evolution diversity; Deep residual networks; Inter-chain contact prediction; Multiple sequence alignment; Pre-trained language models.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Computational Biology / methods
  • Databases, Protein
  • Deep Learning
  • Humans
  • Proteins* / chemistry
  • Proteins* / genetics
  • Sequence Alignment* / methods
  • Sequence Analysis, Protein / methods
  • Unsupervised Machine Learning

Substances

  • Proteins