Assembling multidomain protein structures through analogous global structural alignments

Xiaogen Zhou; Jun Hu; Chengxin Zhang; Guijun Zhang; Yang Zhang

doi:10.1073/pnas.1905068116

Assembling multidomain protein structures through analogous global structural alignments

Proc Natl Acad Sci U S A. 2019 Aug 6;116(32):15930-15938. doi: 10.1073/pnas.1905068116. Epub 2019 Jul 24.

Authors

Xiaogen Zhou^{1

2}, Jun Hu¹, Chengxin Zhang², Guijun Zhang³, Yang Zhang^{4

5}

Affiliations

¹ College of Information Engineering, Zhejiang University of Technology, 310023 HangZhou, China.
² Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109.
³ College of Information Engineering, Zhejiang University of Technology, 310023 HangZhou, China; zgj@zjut.edu.cn zhng@umich.edu.
⁴ Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109; zgj@zjut.edu.cn zhng@umich.edu.
⁵ Department of Biological Chemistry, University of Michigan, Ann Arbor, MI 48109.

Abstract

Most proteins exist with multiple domains in cells for cooperative functionality. However, structural biology and protein folding methods are often optimized for single-domain structures, resulting in a rapidly growing gap between the improved capability for tertiary structure determination and high demand for multidomain structure models. We have developed a pipeline, termed DEMO, for constructing multidomain protein structures by docking-based domain assembly simulations, with interdomain orientations determined by the distance profiles from analogous templates as detected through domain-level structure alignments. The pipeline was tested on a comprehensive benchmark set of 356 proteins consisting of 2-7 continuous and discontinuous domains, for which DEMO generated models with correct global fold (TM-score > 0.5) for 86% of cases with continuous domains and for 100% of cases with discontinuous domain structures, starting from randomly oriented target-domain structures. DEMO was also applied to reassemble multidomain targets in the CASP12 and CASP13 experiments using domain structures excised from the top server predictions, where the full-length DEMO models showed a significantly improved quality over the original server models. Finally, sparse restraints of mass spectrometry-generated cross-linking data and cryo-EM density maps are incorporated into DEMO, resulting in improvements in the average TM-score by 6.3% and 12.5%, respectively. The results demonstrate an efficient approach to assembling multidomain structures, which can be easily used for automated, genome-scale multidomain protein structure assembly.

Keywords: domain assembly; multidomain protein; multidomain template recognition; protein structure prediction.

Publication types

Research Support, N.I.H., Extramural
Research Support, Non-U.S. Gov't
Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

Cross-Linking Reagents / chemistry
Cryoelectron Microscopy
Databases, Protein
Models, Molecular
Protein Domains
Proteins / chemistry*
Software

Substances

Cross-Linking Reagents
Proteins

Abstract

Publication types

MeSH terms

Substances

Grants and funding