Back to Search Start Over

Accurate De Novo Prediction of Protein Contact Map by Ultra-Deep Learning Model

Authors :
Jinbo Xu
Sheng Wang
Siqi Sun
Zhen Li
Renyu Zhang
Source :
PLoS Computational Biology, Vol 13, Iss 1, p e1005324 (2017), PLoS Computational Biology
Publication Year :
2017
Publisher :
Public Library of Science (PLoS), 2017.

Abstract

Motivation Protein contacts contain key information for the understanding of protein structure and function and thus, contact prediction from sequence is an important problem. Recently exciting progress has been made on this problem, but the predicted contacts for proteins without many sequence homologs is still of low quality and not very useful for de novo structure prediction. Method This paper presents a new deep learning method that predicts contacts by integrating both evolutionary coupling (EC) and sequence conservation information through an ultra-deep neural network formed by two deep residual neural networks. The first residual network conducts a series of 1-dimensional convolutional transformation of sequential features; the second residual network conducts a series of 2-dimensional convolutional transformation of pairwise information including output of the first residual network, EC information and pairwise potential. By using very deep residual networks, we can accurately model contact occurrence patterns and complex sequence-structure relationship and thus, obtain higher-quality contact prediction regardless of how many sequence homologs are available for proteins in question. Results Our method greatly outperforms existing methods and leads to much more accurate contact-assisted folding. Tested on 105 CASP11 targets, 76 past CAMEO hard targets, and 398 membrane proteins, the average top L long-range prediction accuracy obtained by our method, one representative EC method CCMpred and the CASP11 winner MetaPSICOV is 0.47, 0.21 and 0.30, respectively; the average top L/10 long-range accuracy of our method, CCMpred and MetaPSICOV is 0.77, 0.47 and 0.59, respectively. Ab initio folding using our predicted contacts as restraints but without any force fields can yield correct folds (i.e., TMscore>0.6) for 203 of the 579 test proteins, while that using MetaPSICOV- and CCMpred-predicted contacts can do so for only 79 and 62 of them, respectively. Our contact-assisted models also have much better quality than template-based models especially for membrane proteins. The 3D models built from our contact prediction have TMscore>0.5 for 208 of the 398 membrane proteins, while those from homology modeling have TMscore>0.5 for only 10 of them. Further, even if trained mostly by soluble proteins, our deep learning method works very well on membrane proteins. In the recent blind CAMEO benchmark, our fully-automated web server implementing this method successfully folded 6 targets with a new fold and only 0.3L-2.3L effective sequence homologs, including one β protein of 182 residues, one α+β protein of 125 residues, one α protein of 140 residues, one α protein of 217 residues, one α/β of 260 residues and one α protein of 462 residues. Our method also achieved the highest F1 score on free-modeling targets in the latest CASP (Critical Assessment of Structure Prediction), although it was not fully implemented back then. Availability http://raptorx.uchicago.edu/ContactMap/<br />Author Summary Protein contact prediction and contact-assisted folding has made good progress due to direct evolutionary coupling analysis (DCA). However, DCA is effective on only some proteins with a very large number of sequence homologs. To further improve contact prediction, we borrow ideas from deep learning, which has recently revolutionized object recognition, speech recognition and the GO game. Our deep learning method can model complex sequence-structure relationship and high-order correlation (i.e., contact occurrence patterns) and thus, improve contact prediction accuracy greatly. Our test results show that our method greatly outperforms the state-of-the-art methods regardless how many sequence homologs are available for a protein in question. Ab initio folding guided by our predicted contacts may fold many more test proteins than the other contact predictors. Our contact-assisted 3D models also have much better quality than homology models built from the training proteins, especially for membrane proteins. One interesting finding is that even trained mostly with soluble proteins, our method performs very well on membrane proteins. Recent blind CAMEO test confirms that our method can fold large proteins with a new fold and only a small number of sequence homologs.

Subjects

Subjects :
0301 basic medicine
FOS: Computer and information sciences
Computer science
Protein Conformation
Cell Membranes
Protein Structure Prediction
computer.software_genre
Residual
Quantitative Biology - Quantitative Methods
Biochemistry
Machine Learning (cs.LG)
Machine Learning
Database and Informatics Methods
Protein Structure Databases
Protein structure
Mathematical and Statistical Techniques
Statistics - Machine Learning
Protein methods
Sequence Analysis, Protein
Macromolecular Structure Analysis
Databases, Protein
lcsh:QH301-705.5
Quantitative Methods (q-bio.QM)
0303 health sciences
Artificial neural network
Ecology
030302 biochemistry & molecular biology
Protein structure prediction
Computational Theory and Mathematics
Modeling and Simulation
Physical Sciences
Protein folding
Data mining
Cellular Structures and Organelles
Biological system
Algorithm
Algorithms
Statistics (Mathematics)
Research Article
Protein Structure
Computer and Information Sciences
Multiple Alignment Calculation
Neural Networks
Protein contact map
Machine Learning (stat.ML)
Research and Analysis Methods
03 medical and health sciences
Cellular and Molecular Neuroscience
Computational Techniques
Genetics
Homology modeling
Statistical Methods
CASP
Molecular Biology
Ecology, Evolution, Behavior and Systematics
030304 developmental biology
business.industry
Deep learning
Computational Biology
Proteins
Biology and Life Sciences
Membrane Proteins
Biomolecules (q-bio.BM)
Cell Biology
Split-Decomposition Method
Convolution
Computer Science - Learning
030104 developmental biology
Biological Databases
Membrane protein
Quantitative Biology - Biomolecules
De novo protein structure prediction
lcsh:Biology (General)
FOS: Biological sciences
Artificial intelligence
Neural Networks, Computer
business
computer
Mathematical Functions
Mathematics
Neuroscience
Forecasting

Details

Language :
English
ISSN :
15537358
Volume :
13
Issue :
1
Database :
OpenAIRE
Journal :
PLoS Computational Biology
Accession number :
edsair.doi.dedup.....e0a7c72a0606b00d6bcc3c1f18582e9a