1. Protein sequence information encodes more than the global minimum structure
- Author
-
Schwarz, Dominik and Deane, Charlotte
- Subjects
Computational biology ,Protein structure prediction ,Structural bioinformatics - Abstract
Allostery is a conformational or activity change of a protein's active site resulting from a binding event at a distant, allosteric, site. The signal transmission is hypothesised to travel via allosteric networks and knowledge about the exact residues that transmit the signal would be beneficial for developing allosteric drugs. Allosteric drugs should offer high selectivity and/or better treatment through combination therapies. We investigated if co-evolution techniques could be used to identify allosteric residues. While direct coupling analysis (DCA) methods without machine learning, such as EV-Fold, CCMpred and PSICOV, recalled larger numbers of allosteric residues, machine learning-based techniques like MetaPSICOV2 and RaptorX showed higher precision in predicting physical proximity (contact prediction). From this we conclude that different constraints on the sequence space are likely to be extracted by different co-evolution methods. Next, we investigated if the co-evolutionary distance predictor DMPfold encodes information on conformational flexibility in the shape of its predicted distance distribution for each residue pair. We analysed a set of pairs of PDB structures (2947 proteins) where the two structures of the same sequence showed different conformations. The pairs were used to approximate residue pair flexibility. We found a statistically significant difference between flexible and rigid residue pairs in terms of their predicted distance distributions. Flexible residue pairs more often had multiple local maxima in their predicted distance distributions whilst rigid pairs more often had just a single maximum. This highlights the potential of co-evolution-based methods to predict conformational ensembles. In addition to our analyses of co-evolutionary data, we explored other constraints on the sequence space of protein families: rare conformations in protein ensembles as well as folding pathways. Protein kinases are a protein family with a vast amount of structural data available and allow us to observe rare conformations in some kinases that might be accessible by other kinases at an energetic cost. We examined conformational ensembles of kinases that were generated systematically by a novel homology modelling pipeline and assessed the model ensembles' potential for docking studies. In an exploratory docking study with two kinases and five inhibitors we found the generated models to be suitable for further docking calculations. In the last chapter we describe an initial analysis of folding pathway conservation with TMPfold, a predictor of helical membrane protein folding pathways. We found an indication for folding pathway conservation within families when analysing the predicted helix-helix association energies that build the basis for the folding pathway prediction. Nevertheless, the conservation signal was ambiguous when comparing the predicted pathways directly, suggesting that the predictor itself needs further development before being applied on a larger scale.
- Published
- 2021