GCmapCrys: Integrating graph attention network with predicted contact map for multi-stage protein crystallization propensity prediction

Anal Biochem. 2023 Feb 15:663:115020. doi: 10.1016/j.ab.2022.115020. Epub 2022 Dec 12.

Abstract

X-ray crystallography is the major approach for atomic-level protein structure determination. Since not all proteins can be easily crystallized, accurate prediction of protein crystallization propensity is critical to guiding the experimental design and improving the success rate of X-ray crystallography experiments. In this work, we proposed a new deep learning pipeline, GCmapCrys, for multi-stage crystallization propensity prediction through integrating graph attention network with predicted protein contact map. Experimental results on 1548 proteins with known crystallization records demonstrated that GCmapCrys increased the value of Matthew's correlation coefficient by 37.0% in average compared to state-of-the-art protein crystallization propensity predictors. Detailed analyses show that the major advantages of GCmapCrys lie in the efficiency of the graph attention network with predicted contact map, which effectively associates the residue-interaction knowledge with crystallization pattern. Meanwhile, the designed four sequence-based features can be complementary to further enhance crystallization propensity proprediction.

Keywords: Graph attention network; Protein contact map; Protein crystallization propensity prediction.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Computational Biology* / methods
  • Crystallization / methods
  • Crystallography, X-Ray
  • Proteins* / chemistry

Substances

  • Proteins