A large-scale comparative assessment of methods for residue-residue contact prediction

Qiqige Wuyun; Wei Zheng; Zhenling Peng; Jianyi Yang

doi:10.1093/bib/bbw106

A large-scale comparative assessment of methods for residue-residue contact prediction

Brief Bioinform. 2018 Mar 1;19(2):219-230. doi: 10.1093/bib/bbw106.

Authors

Qiqige Wuyun¹, Wei Zheng¹, Zhenling Peng², Jianyi Yang¹

Affiliations

¹ School of Mathematical Sciences, Nankai University, Tianjin, China.
² Center for Applied Mathematics, Tianjin University, Tianjin, China.

PMID: 27802931
DOI: 10.1093/bib/bbw106

Abstract

Sequence-based prediction of residue-residue contact in proteins becomes increasingly more important for improving protein structure prediction in the big data era. In this study, we performed a large-scale comparative assessment of 15 locally installed contact predictors. To assess these methods, we collected a big data set consisting of 680 nonredundant proteins covering different structural classes and target difficulties. We investigated a wide range of factors that may influence the precision of contact prediction, including target difficulty, structural class, the alignment depth and distribution of contact pairs in a protein structure. We found that: (1) the machine learning-based methods outperform the direct-coupling-based methods for short-range contact prediction, while the latter are significantly better for long-range contact prediction. The consensus-based methods, which combine machine learning and direct-coupling methods, perform the best. (2) The target difficulty does not have clear influence on the machine learning-based methods, while it does affect the direct-coupling and consensus-based methods significantly. (3) The alignment depth has relatively weak effect on the machine learning-based methods. However, for the direct-coupling-based methods and consensus-based methods, the predicted contacts for targets with deeper alignment tend to be more accurate. (4) All methods perform relatively better on β and α + β proteins than on α proteins. (5) Residues buried in the core of protein structure are more prone to be in contact than residues on the surface (22 versus 6%). We believe these are useful results for guiding future development of new approach to contact prediction.

Publication types

Comparative Study
Research Support, Non-U.S. Gov't

MeSH terms

Algorithms*
Computational Biology / methods
Humans
Models, Molecular
Protein Conformation
Protein Folding
Protein Interaction Domains and Motifs*
Proteins / chemistry
Proteins / metabolism*
Sequence Analysis, Protein / methods*

Substances

Proteins