121 results on '"protein structure modeling"'
Search Results
2. Phosphorylation of T897 in the dimerization domain of Gemin5 modulates protein interactions and translation regulation
- Author
-
Rosario Francisco-Velilla, Azman Embarc-Buh, Salvador Abellan, Francisco del Caño-Ochoa, Santiago Ramón-Maiques, Encarnacion Martinez-Salas, Ministerio de Ciencia e Innovación (España), European Commission, Comunidad de Madrid, Fundación Ramón Areces, Generalitat Valenciana, Ramon-Maiques, Santiago, del Caño-Ochoa, Francisco, and Martinez-Salas, Encarnacion
- Subjects
Gemin5 interactome ,SMN complex ,WD-40, tryptophan-aspartic repeat motif ,Phosphoresidues ,TPR-like, tetratricopeptide repeat-like domain ,BiNGO, Biological Networks Gene Ontology application ,Biophysics ,eIF4E, eukaryotic initiation factor 4E ,RNA-binding proteins ,Human variants ,Biochemistry ,CHX, cycloheximide ,Structural Biology ,NEDCAM, neurological disorders with cerebellar atrophy and motor dysfunction ,SMN, survival of motor neurons ,Genetics ,TAP, tandem affinity purification ,SGs, stress granules ,RBS1, RNA-binding site1 ,MD, molecular dynamics ,Computer Science Applications ,IRES, internal ribosome entry site ,snRNAs, small nuclear RNAs ,Protein structure modeling ,RBP, RNA-binding protein ,Protein synthesis ,Neurological disease ,LC-MS/MS, liquid chromatography-mass spectrometry ,Biotechnology - Abstract
10 páginas, 7 figuras, Gemin5 is a multifunctional RNA binding protein (RBP) organized in domains with a distinctive structural organization. The protein is a hub for several protein networks performing diverse RNA-dependent functions including regulation of translation, and recognition of small nuclear RNAs (snRNAs). Here we sought to identify the presence of phosphoresidues on the C-terminal half of Gemin5, a region of the protein that harbors a tetratricopeptide repeat (TPR)-like dimerization domain and a non-canonical RNA binding site (RBS1). We identified two phosphoresidues in the purified protein: P-T897 in the dimerization domain and P-T1355 in RBS1. Replacing T897 and T1355 with alanine led to decreased translation, and mass spectrometry analysis revealed that mutation T897A strongly abrogates the association with cellular proteins related to the regulation of translation. In contrast, the phosphomimetic substitutions to glutamate partially rescued the translation regulatory activity. The structural analysis of the TPR dimerization domain indicates that local rearrangements caused by phosphorylation of T897 affect the conformation of the flexible loop 2-3, and propagate across the dimerization interface, impacting the position of the C-terminal helices and the loop 12-13 shown to be mutated in patients with neurological disorders. Computational analysis of the potential relationship between post-translation modifications and currently known pathogenic variants indicates a lack of overlapping of the affected residues within the functional domains of the protein and provides molecular insights for the implication of the phosphorylated residues in translation regulation., This work was supported by the Ministerio de Ciencia e Innovación (MICIN) and Fondo Europeo de Desarrollo Regional (AEI/FEDER UE) (PID2020-115096RB-I00), Comunidad de Madrid (B2017/BMD-3770) and an Institutional grant from Fundación Ramón Areces. FdC is a postdoctoral fellow of the Generalitat Valenciana (APOSTD 2021).
- Published
- 2022
3. Efficient Flexible Fitting Refinement with Automatic Error Fixing for De Novo Structure Modeling from Cryo-EM Density Maps
- Author
-
Daisuke Kihara, Takaharu Mori, Genki Terashi, Yuji Sugita, and Daisuke Matsuoka
- Subjects
010304 chemical physics ,Protein Conformation ,Computer science ,Cryo-electron microscopy ,General Chemical Engineering ,Cryoelectron Microscopy ,Structure (category theory) ,Proteins ,General Chemistry ,Molecular Dynamics Simulation ,Library and Information Sciences ,Overfitting ,01 natural sciences ,0104 chemical sciences ,Computer Science Applications ,Progressive refinement ,010404 medicinal & biomolecular chemistry ,Molecular dynamics ,Structural biology ,0103 physical sciences ,Simulated annealing ,Protein structure modeling ,Algorithm - Abstract
Structural modeling of proteins from cryo-electron microscopy (cryo-EM) density maps is one of the challenging issues in structural biology. De novo modeling combined with flexible fitting refinement (FFR) has been widely used to build a structure of new proteins. In de novo prediction, artificial conformations containing local structural errors such as chirality errors, cis peptide bonds, and ring penetrations are frequently generated and cannot be easily removed in the subsequent FFR. Moreover, refinement can be significantly suppressed due to the low mobility of atoms inside the protein. To overcome these problems, we propose an efficient scheme for FFR, in which the local structural errors are fixed first, followed by FFR using an iterative simulated annealing (SA) molecular dynamics protocol with the united atom (UA) model in an implicit solvent model; we call this scheme "SAUA-FFR". The best model is selected from multiple flexible fitting runs with various biasing force constants to reduce overfitting. We apply our scheme to the decoys obtained from MAINMAST and demonstrate an improvement of the best model of eight selected proteins in terms of the root-mean-square deviation, MolProbity score, and RWplus score compared to the original scheme of MAINMAST. Fixing the local structural errors can enhance the formation of secondary structures, and the UA model enables progressive refinement compared to the all-atom model owing to its high mobility in the implicit solvent. The SAUA-FFR scheme realizes efficient and accurate protein structure modeling from medium-resolution maps with less overfitting.
- Published
- 2021
4. Association with proteasome determines pathogenic threshold of polyglutamine expansion diseases
- Author
-
Ilya Bezprozvanny and Mee Whi Kim
- Subjects
Models, Molecular ,0301 basic medicine ,Proteasome Endopeptidase Complex ,Huntingtin ,Biophysics ,Cell Biology ,Biology ,Biochemistry ,Protein Structure, Secondary ,Article ,Cell biology ,03 medical and health sciences ,030104 developmental biology ,0302 clinical medicine ,Protein Domains ,Proteasome ,030220 oncology & carcinogenesis ,Ataxin ,Humans ,Disease ,Peptides ,Trinucleotide Repeat Expansion ,Protein structure modeling ,Molecular Biology - Abstract
Expansion of glutamine residue track (polyQ) within soluble protein is responsible for eight autosomal-dominant genetic neurodegenerative disorders. These disorders affect cerebellum, striatum, basal ganglia and other brain regions. Each disease develops when polyQ expansion exceeds a pathogenic threshold (Q(th)). A pathogenic threshold is unique for each disease but the reasons for variability in Q(th) within this family of proteins are poorly understood. In the previous publication we proposed that polarity of the regions flanking polyQ track in each protein plays a key role in defining Q(th) value [1]. To explain the correlation between the polarity of the flanking sequences and Q(th) we performed quantitative analysis of interactions between polyQ-expanded proteins and proteasome. Based on structural and theoretical modeling, we predict that Q(th) value is determined by the energy of polar interaction of the flanking regions with the polyQ and proteasome. More polar flanking regions facilitate unfolding of α-helical polyQ conformation adopted inside the proteasome and as a result, increase Qth. Predictions of our model are consistent with Q(th) values observed in clinic for each of the eight polyQ-expansion disorders. Our results suggest that the agents that can destabilize polyQ α-helical structure may have a beneficial therapeutic effect for treatment of polyQ-expansion disorders.
- Published
- 2021
5. A New Protocol for Atomic-Level Protein Structure Modeling and Refinement Using Low-to-Medium Resolution Cryo-EM Density Maps
- Author
-
Biao Zhang, Yang Zhang, Hong-Bin Shen, Xi Zhang, and Robin Pearce
- Subjects
Models, Molecular ,Correctness ,Protein Conformation ,Computer science ,Cryo-electron microscopy ,Article ,Force field (chemistry) ,03 medical and health sciences ,0302 clinical medicine ,Structural Biology ,Animals ,Humans ,Molecular Biology ,030304 developmental biology ,0303 health sciences ,Cryoelectron Microscopy ,Proteins ,Protein structure prediction ,Template modeling score ,Medium resolution ,Structural biology ,Protein structure modeling ,Monte Carlo Method ,Algorithm ,Algorithms ,030217 neurology & neurosurgery - Abstract
The rapid progress of cryo-electron microscopy (cryo-EM) in structural biology has raised an urgent need for robust methods to create and refine atomic-level structural models using low-resolution EM density maps. We propose a new protocol to create initial models using I-TASSER protein structure prediction, followed by EM density map-based rigid-body structure fitting, flexible fragment adjustment and atomic-level structure refinement simulations. The protocol was tested on a large set of 285 non-homologous proteins and generated structural models with correct folds for 260 proteins, where 28% had RMSDs below 2 A. Compared to other state-of-the-art methods, the major advantage of the proposed pipeline lies in the uniform structure prediction and refinement protocol, as well as the extensive structural re-assembly simulations, which allow for low-to-medium resolution EM density map-guided structure modeling starting from amino acid sequences. Interestingly, the quality of both the image fitting and subsequent structure refinement was found to be strongly correlated with the correctness of the initial I-TASSER models; this is mainly due to the different correlation patterns observed between force field and structural quality for the models with template modeling score (or TM-score, a metric quantifying the similarity of models to the native) above and below a threshold of 0.5. Overall, the results demonstrate a new avenue that is ready to use for large-scale cryo-EM-based structure modeling and atomic-level density map-guided structure refinement.
- Published
- 2020
6. High-density chemical cross-linking for modeling protein interactions
- Author
-
Steven P. Gygi and Julian Mintseris
- Subjects
Models, Molecular ,Proteasome Endopeptidase Complex ,Protein Conformation ,Saccharomyces cerevisiae ,Computational biology ,Mass spectrometry ,protein structure modeling ,Protein–protein interaction ,Imaging, Three-Dimensional ,Protein structure ,polycyclic compounds ,Protein Interaction Domains and Motifs ,Macromolecular docking ,mass spectrometry ,chemistry.chemical_classification ,Multidisciplinary ,Chemistry ,Cryoelectron Microscopy ,Proteins ,biochemical phenomena, metabolism, and nutrition ,Amino acid ,enzymes and coenzymes (carbohydrates) ,Cross-Linking Reagents ,Enzyme ,PNAS Plus ,Proteasome ,bacteria ,Chromatin Immunoprecipitation Sequencing ,Peptides ,Function (biology) ,cross-linking - Abstract
Detailed mechanistic understanding of protein complex function is greatly enhanced by insights from its 3-dimensional structure. Traditional methods of protein structure elucidation remain expensive and labor-intensive and require highly purified starting material. Chemical cross-linking coupled with mass spectrometry offers an alternative that has seen increased use, especially in combination with other experimental approaches like cryo-electron microscopy. Here we report advances in method development, combining several orthogonal cross-linking chemistries as well as improvements in search algorithms, statistical analysis, and computational cost to achieve coverage of 1 unique cross-linked position pair for every 7 amino acids at a 1% false discovery rate. This is accomplished without any peptide-level fractionation or enrichment. We apply our methods to model the complex between a carbonic anhydrase (CA) and its protein inhibitor, showing that the cross-links are self-consistent and define the interaction interface at high resolution. The resulting model suggests a scaffold for development of a class of protein-based inhibitors of the CA family of enzymes. We next cross-link the yeast proteasome, identifying 3,893 unique cross-linked peptides in 3 mass spectrometry runs. The dataset includes 1,704 unique cross-linked position pairs for the proteasome subunits, more than half of them intersubunit. Using multiple recently solved cryo-EM structures, we show that observed cross-links reflect the conformational dynamics and disorder of some proteasome subunits. We further demonstrate that this level of cross-linking density is sufficient to model the architecture of the 19-subunit regulatory particle de novo.
- Published
- 2019
7. CirPred, the first structure modeling and linker design system for circularly permuted proteins
- Author
-
Yen-Cheng Lin, Chih-Chieh Chen, Teng-Ruei Chen, Wei-Cheng Lo, and Yu-Wei Huang
- Subjects
Circular permutation ,QH301-705.5 ,Computer science ,Computer applications to medicine. Medical informatics ,R858-859.7 ,Structure (category theory) ,Biochemistry ,Domain (software engineering) ,Structural Biology ,Biology (General) ,Molecular Biology ,Applied Mathematics ,Methodology ,Proteins ,Design systems ,Protein engineering ,Protein structure prediction ,Circular permutation in proteins ,Computer Science Applications ,Protein structure modeling ,Linker ,Algorithm ,Algorithms - Abstract
Background This work aims to help develop new protein engineering techniques based on a structural rearrangement phenomenon called circular permutation (CP), equivalent to connecting the native termini of a protein followed by creating new termini at another site. Although CP has been applied in many fields, its implementation is still costly because of inevitable trials and errors. Results Here we present CirPred, a structure modeling and termini linker design method for circularly permuted proteins. Compared with state-of-the-art protein structure modeling methods, CirPred is the only one fully capable of both circularly-permuted modeling and traditional co-linear modeling. CirPred performs well when the permutant shares low sequence identity with the native protein and even when the permutant adopts a different conformation from the native protein because of three-dimensional (3D) domain swapping. Linker redesign experiments demonstrated that the linker design algorithm of CirPred achieved subangstrom accuracy. Conclusions The CirPred system is capable of (1) predicting the structure of circular permutants, (2) designing termini linkers, (3) performing traditional co-linear protein structure modeling, and (4) identifying the CP-induced occurrence of 3D domain swapping. This method is supposed helpful for broadening the application of CP, and its web server is available at http://10.life.nctu.edu.tw/CirPred/ and http://lo.life.nctu.edu.tw/CirPred/.
- Published
- 2021
8. A-Prot: protein structure modeling using MSA transformer
- Author
-
Jiho Lee, Jong-Min Ko, and Yun-Chul Hong
- Subjects
Models, Molecular ,Computer science ,Property (programming) ,Applied Mathematics ,Structure (category theory) ,Proteins ,Dihedral angle ,Biochemistry ,Computer Science Applications ,Electric Power Supplies ,Structural Biology ,Feature (machine learning) ,Tensor ,Language model ,Protein structure modeling ,Algorithm ,Molecular Biology ,Sequence Alignment ,Transformer (machine learning model) - Abstract
Background The accuracy of protein 3D structure prediction has been dramatically improved with the help of advances in deep learning. In the recent CASP14, Deepmind demonstrated that their new version of AlphaFold (AF) produces highly accurate 3D models almost close to experimental structures. The success of AF shows that the multiple sequence alignment of a sequence contains rich evolutionary information, leading to accurate 3D models. Despite the success of AF, only the prediction code is open, and training a similar model requires a vast amount of computational resources. Thus, developing a lighter prediction model is still necessary. Results In this study, we propose a new protein 3D structure modeling method, A-Prot, using MSA Transformer, one of the state-of-the-art protein language models. An MSA feature tensor and row attention maps are extracted and converted into 2D residue-residue distance and dihedral angle predictions for a given MSA. We demonstrated that A-Prot predicts long-range contacts better than the existing methods. Additionally, we modeled the 3D structures of the free modeling and hard template-based modeling targets of CASP14. The assessment shows that the A-Prot models are more accurate than most top server groups of CASP14. Conclusion These results imply that A-Prot accurately captures the evolutionary and structural information of proteins with relatively low computational cost. Thus, A-Prot can provide a clue for the development of other protein property prediction methods.
- Published
- 2021
9. Accurate prediction of protein structures and interactions using a 3-track network
- Author
-
Nick V. Grishin, Minkyung Baek, Udit Dalwadi, Gyu Rie Lee, Hahnbeom Park, Carson Adams, van Dijk Aa, Manoj K. Rathinaswamy, Theo Sagmeister, Qian Cong, Frank DiMaio, Randy J. Read, David Baker, Paul D. Adams, Sergey Ovchinnikov, Buhlheller C, Calvin K. Yip, Caleb R. Glassman, Ivan Anishchenko, Schaeffer Rd, Claudia Millán, Diederik J. Opperman, Tea Pavkov-Keller, Jose Henrique Pereira, Ana C. Ebrecht, Lisa N. Kinch, Jing Wang, John E. Burke, Kenan Christopher Garcia, Andria V. Rodrigues, Justas Dauparas, and Andy DeGiovanni
- Subjects
Structure (mathematical logic) ,Network architecture ,Sequence ,Protein structure ,Computer science ,Data mining ,Track (rail transport) ,Protein structure modeling ,computer.software_genre ,computer ,Distance transform - Abstract
DeepMind presented remarkably accurate protein structure predictions at the CASP14 conference. We explored network architectures incorporating related ideas and obtained the best performance with a 3-track network in which information at the 1D sequence level, the 2D distance map level, and the 3D coordinate level is successively transformed and integrated. The 3-track network produces structure predictions with accuracies approaching those of DeepMind in CASP14, enables rapid solution of challenging X-ray crystallography and cryo-EM structure modeling problems, and provides insights into the functions of proteins of currently unknown structure. The network also enables rapid generation of accurate models of protein-protein complexes from sequence information alone, short circuiting traditional approaches which require modeling of individual subunits followed by docking. We make the method available to the scientific community to speed biological research.One-Sentence SummaryAccurate protein structure modeling enables rapid solution of structure determination problems and provides insights into biological function.
- Published
- 2021
10. Rapid determination of quaternary protein structures in complex biological samples
- Author
-
Lotta Happonen, Simon Hauri, Hamed Khakzad, Johan Teleman, Johan Malmström, Lars Malmström, University of Zurich, and Malmström, Johan
- Subjects
0301 basic medicine ,Distance constraints ,530 Physics ,Science ,General Physics and Astronomy ,1600 General Chemistry ,Genetics and Molecular Biology ,Peptide ,02 engineering and technology ,Computational biology ,Mass spectrometry ,General Biochemistry, Genetics and Molecular Biology ,Article ,03 medical and health sciences ,Protein structure ,1300 General Biochemistry, Genetics and Molecular Biology ,Tandem Mass Spectrometry ,Humans ,Protein Structure, Quaternary ,lcsh:Science ,chemistry.chemical_classification ,Chromatography, Reverse-Phase ,Multidisciplinary ,General Chemistry ,Blood Proteins ,021001 nanoscience & nanotechnology ,3100 General Physics and Astronomy ,Healthy Volunteers ,Recombinant Proteins ,Molecular Docking Simulation ,030104 developmental biology ,Cross-Linking Reagents ,chemistry ,Docking (molecular) ,10231 Institute for Computational Science ,Multiprotein Complexes ,General Biochemistry ,Protein quaternary structure ,lcsh:Q ,0210 nano-technology ,Surface protein ,Protein structure modeling ,Algorithms ,Software - Abstract
The understanding of complex biological systems is still hampered by limited knowledge of biologically relevant quaternary protein structures. Here, we demonstrate quaternary structure determination in biological samples using a combination of chemical cross-linking, high-resolution mass spectrometry and high-accuracy protein structure modeling. This approach, termed targeted cross-linking mass spectrometry (TX-MS), relies on computational structural models to score sets of targeted cross-linked peptide signals acquired using a combination of mass spectrometry acquisition techniques. We demonstrate the utility of TX-MS by creating a high-resolution quaternary model of a 1.8 MDa protein complex composed of a pathogen surface protein and ten human plasma proteins. The model is based on a dense network of cross-link distance constraints obtained directly in a mixture of human plasma and live bacteria. These results demonstrate that TX-MS can increase the applicability of flexible backbone docking algorithms to large protein complexes by providing rich cross-link distance information from complex biological samples., Protein structure determination in complex biological samples is still challenging. Here, the authors develop a computational modeling-guided cross-linking mass spectrometry method, obtaining a high-resolution model of a 1.8 MDa protein assembly from cross-links detected in a mixture of human plasma and bacteria.
- Published
- 2019
11. DeepTracer for fast de novo cryo-EM protein structure modeling and special studies on CoV-related complexes
- Author
-
Dong Si, Nhut Minh Phan, and Jonas Pfab
- Subjects
2019-20 coronavirus outbreak ,de novo ,Multidisciplinary ,Coronavirus disease 2019 (COVID-19) ,Molecular Structure ,Computer science ,Cryo-electron microscopy ,SARS-CoV-2 ,Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) ,Cryoelectron Microscopy ,modeling ,Biological Sciences ,Set (abstract data type) ,Models, Structural ,Viral Proteins ,Biophysics and Computational Biology ,Deep Learning ,Fully automated ,Physical Sciences ,cryo-EM ,structure ,Biological system ,Protein structure modeling ,complex - Abstract
Significance Electron cryomicroscopy (cryo-EM), a 2017 Nobel prize-awarded technology, provides direct 3D maps of macromolecules and explains the shape and interactions of protein complexes such as SARS-CoV-2 viral proteins and human cell receptors. This understanding can be combined with detailed structural information gathered using other technologies to form the basis for modeling course of diseases and for designing therapeutic drugs. However, ab initio modeling of protein complex structure remains a challenging problem. Here, we present DeepTracer, a fully automated and robust tool that determines the all-atom structure of a protein complex based solely on its cryo-EM map and amino acid sequence, with improved accuracy and efficiency compared to previous methods. We also provide a web service for global access., Information about macromolecular structure of protein complexes and related cellular and molecular mechanisms can assist the search for vaccines and drug development processes. To obtain such structural information, we present DeepTracer, a fully automated deep learning-based method for fast de novo multichain protein complex structure determination from high-resolution cryoelectron microscopy (cryo-EM) maps. We applied DeepTracer on a previously published set of 476 raw experimental cryo-EM maps and compared the results with a current state of the art method. The residue coverage increased by over 30% using DeepTracer, and the rmsd value improved from 1.29 Å to 1.18 Å. Additionally, we applied DeepTracer on a set of 62 coronavirus-related cryo-EM maps, among them 10 with no deposited structure available in EMDataResource. We observed an average residue match of 84% with the deposited structures and an average rmsd of 0.93 Å. Additional tests with related methods further exemplify DeepTracer’s competitive accuracy and efficiency of structure modeling. DeepTracer allows for exceptionally fast computations, making it possible to trace around 60,000 residues in 350 chains within only 2 h. The web service is globally accessible at https://deeptracer.uw.edu.
- Published
- 2020
12. DeepTracer: Fast Cryo-EM Protein Structure Modeling and Special Studies on CoV-related Complexes
- Author
-
Jonas Pfab, Nhut Minh Phan, and Dong Si
- Subjects
Set (abstract data type) ,Materials science ,Cryo-electron microscopy ,Computation ,Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) ,Fully automatic ,Biological system ,Protein structure modeling - Abstract
Information about macromolecular structure of protein complexes such as SARS-CoV-2, and related cellular and molecular mechanisms can assist the search for vaccines and drug development processes. To obtain such structural information, we present DeepTracer, a fully automatic deep learning-based method for fast de novo multi-chain protein complex structure determination from high-resolution cryo-electron microscopy (cryo-EM) density maps. We applied DeepTracer on a previously published set of 476 raw experimental density maps and compared the results with a current state of the art method. The residue coverage increased by over 30% using DeepTracer and the RMSD value improved from 1.29Å to 1.18Å. Additionally, we applied DeepTracer on a set of 62 coronavirus-related density maps, among them 10 with no deposited structure available in EMDataResource. We observed an average residue match of 84% with the deposited structures and an average RMSD of 0.93Å. Additional tests with related methods further exemplify DeepTracer’s competitive accuracy and efficiency of structure modeling. DeepTracer allows for exceptionally fast computations, making it possible to trace around 60,000 residues in 350 chains within only two hours. The web service is globally accessible at https://deeptracer.uw.edu.
- Published
- 2020
13. Experimentally-Driven Protein Structure Modeling
- Author
-
Nikolay V. Dokholyan
- Subjects
0301 basic medicine ,Models, Molecular ,030102 biochemistry & molecular biology ,Molecular model ,Computer science ,Specific time ,Biophysics ,Structure (category theory) ,Molecular Conformation ,Experimental data ,Proteins ,Statistical mechanics ,Resolution (logic) ,Molecular Dynamics Simulation ,Biochemistry ,Field (computer science) ,Article ,03 medical and health sciences ,030104 developmental biology ,Statistical physics ,Protein structure modeling - Abstract
Revolutions in natural and exact sciences started at the dawn of last century have led to the explosion of theoretical, experimental, and computational approaches to determine structures of molecules, complexes, as well as their rich conformational dynamics. Since different experimental methods produce information that is attributed to specific time and length scales, corresponding computational methods have to be tailored to these scales and experiments. These methods can be then combined and integrated in scales, hence producing a fuller picture of molecular structure and motion from the "puzzle pieces" offered by various experiments. Here, we describe a number of computational approaches to utilize experimental data to glance into structure of proteins and understand their dynamics. We will also discuss the limitations and the resolution of the constraints-based modeling approaches. SIGNIFICANCE: Experimentally-driven computational structure modeling and determination is a rapidly evolving alternative to traditional approaches for molecular structure determination. These new hybrid experimental-computational approaches are proving to be a powerful microscope to glance into the structural features of intrinsically or partially disordered proteins, dynamics of molecules and complexes. In this review, we describe various approaches in the field of experimentally-driven computational structure modeling.
- Published
- 2020
14. Combining Information from Crosslinks and Monolinks in the Modeling of Protein Structures
- Author
-
Mallur S. Madhusudhan, Maya Topf, Matthew Sinnott, Sony Malhotra, and Konstantinos Thalassinos
- Subjects
Models, Molecular ,Proteomics ,0303 health sciences ,Computer science ,Protein Conformation ,030302 biochemistry & molecular biology ,Experimental data ,Proteins ,Bulk water ,03 medical and health sciences ,Protein structure ,Structural Biology ,Protein structure modeling ,Biological system ,Databases, Protein ,Molecular Biology ,Creatine Kinase ,Software ,030304 developmental biology - Abstract
Monolinks are produced in a chemical crosslinking mass spectrometry experiment and are more abundant than crosslinks. They convey residue exposure information, but so far have not been used in the modeling of protein structures. Here, we present the Monolink Depth Score (MoDS), for assessing structural models based on the depth of monolinked residues, corresponding to their distance to the nearest bulk water. Using simulated and reprocessed experimental data from the Proteomic Identification Database, we compare the performance of MoDS to MNXL, our previously developed score for assessing models based on crosslinking data. Our results show that MoDS can be used to effectively score models based on monolinks, and that a crosslink/monolink combined score (XLMO) leads to overall higher performance. The work strongly supports the use of monolink data in the context of integrative structure determination. We also present XLM-Tools, a program to assist in this effort, available at: https://github.com/Topf-Lab/XLM-Tools.
- Published
- 2020
15. Dynamic Evolution of the Cthrc1 Genes, a Newly Defined Collagen-Like Family
- Author
-
Uri Gat, Dina Schneidman-Duhovny, M. Braitbard, Tal S Nir, Lucas Leclère, Michael Bazarsky, Laboratoire de Biologie du Développement de Villefranche sur mer (LBDV), Sorbonne Université (SU)-Centre National de la Recherche Scientifique (CNRS)-Institut de la Mer de Villefranche (IMEV), Sorbonne Université (SU)-Centre National de la Recherche Scientifique (CNRS)-Sorbonne Université (SU)-Centre National de la Recherche Scientifique (CNRS), and The Hebrew University of Jerusalem (HUJ)
- Subjects
collagen ,Frizzled ,[SDV]Life Sciences [q-bio] ,Collagen helix ,C1q domain ,Biology ,phylogeny ,protein structure modeling ,Evolution, Molecular ,03 medical and health sciences ,Cnidaria ,Mice ,0302 clinical medicine ,Clytia ,Phylogenetics ,Genetics ,Gene family ,Animals ,Humans ,Gene ,Ecology, Evolution, Behavior and Systematics ,030304 developmental biology ,Nematostella ,0303 health sciences ,Collagen Triple Helix Repeat-Containing Protein 1 ,Extracellular Matrix Proteins ,Likelihood Functions ,Wnt signaling pathway ,gene loss ,Cthrc1 ,Cell biology ,Sea Anemones ,030217 neurology & neurosurgery ,Research Article - Abstract
Collagen triple helix repeat containing protein 1 (Cthrc1) is a secreted glycoprotein reported to regulate collagen deposition and to be linked to the Transforming growth factor β/Bone morphogenetic protein and the Wnt/planar cell polarity pathways. It was first identified as being induced upon injury to rat arteries and was found to be highly expressed in multiple human cancer types. Here, we explore the phylogenetic and evolutionary trends of this metazoan gene family, previously studied only in vertebrates. We identify Cthrc1 orthologs in two distant cnidarian species, the sea anemone Nematostella vectensis and the hydrozoan Clytia hemisphaerica, both of which harbor multiple copies of this gene. We find that Cthrc1 clade-specific diversification occurred multiple times in cnidarians as well as in most metazoan clades where we detected this gene. Many other groups, such as arthropods and nematodes, have entirely lost this gene family. Most vertebrates display a single highly conserved gene, and we show that the sequence evolutionary rate of Cthrc1 drastically decreased within the gnathostome lineage. Interestingly, this reduction coincided with the origin of its conserved upstream neighboring gene, Frizzled 6 (FZD6), which in mice has been shown to functionally interact with Cthrc1. Structural modeling methods further reveal that the yet uncharacterized C-terminal domain of Cthrc1 is similar in structure to the globular C1q superfamily domain, also found in the C-termini of collagens VIII and X. Thus, our studies show that the Cthrc1 genes are a collagen-like family with a variable short collagen triple helix domain and a highly conserved C-terminal domain structure resembling the C1q family.
- Published
- 2020
16. Modifying inter-cistronic sequence significantly enhances IRES dependent second gene expression in bicistronic vector: Construction of optimised cassette for gene therapy of familial hypercholesterolemia
- Author
-
Huseyin Mehmet, Wajahatullah Khan, Futwan Al-Mohanna, Zainularifeen Abduljaleel, Simon N. Waddington, Michael Themis, Charles Coutelle, Zuhair N. Al-Hassnan, S Apostolidou, Brian W. Bigger, Mohiuddin M. Taher, Mohammad Athar, Abdellatif Bouazzaoui, Mukaddes Colakogullari, Mohammed N. Al-Ahdal, and Faisal A. Al-Allaf
- Subjects
0301 basic medicine ,lcsh:QH426-470 ,Base pair ,Internal ribosome entry site ,Familial hypercholesterolemia ,Biochemistry ,Article ,03 medical and health sciences ,0302 clinical medicine ,Gene therapy ,Cistron ,IRES ,Gene expression ,Genetics ,Inter-cistronic sequences ,Molecular Biology ,Gene ,Messenger RNA ,Chemistry ,Biochemistry (medical) ,Translation (biology) ,MD simulation ,Transfection ,Cell biology ,lcsh:Genetics ,030104 developmental biology ,030220 oncology & carcinogenesis ,Protein structure modeling - Abstract
Internal ribosome entry site (IRES) sequences have become a valuable tool in the construction of gene transfer and therapeutic vectors for multi-cistronic gene expression from a single mRNA transcript. The optimal conditions for effective use of this sequence to construct a functional expression vector are not precisely defined but it is generally assumed that the internal ribosome entry site dependent expression of the second gene in such as cassette is less efficient than the cap-dependent expression of the first gene. Mainly tailoring inter-cistronic sequence significantly enhances IRES dependent second gene expression in bicistronic vector further in construction of optimised cassette for gene therapy of familial hypercholesterolemia. We tailored the size of the inter-cistronic spacer sequence at the 5′ region of the internal ribosome entry site sequence using sequential deletions and demonstrated that the expression of the 3′ gene can be significantly increased to similar levels as the cap-dependent expression of the 5’ gene. Maximum expression efficiency of the downstream gene was obtained when the spacer is composed of 18–141 base pairs. In this case a single mRNA transcriptional unit containing both the first and the second Cistron was detected. Whilst constructs with spacer sequences of 216 bp or longer generate a single transcriptional unit containing only the first Cistron. This suggests that long spacers may affect transcription termination. When the spacer is 188 bp, both transcripts were produced simultaneously in most transfected cells, while a fraction of them expressed only the first but not the second gene. Expression analyses of vectors containing optimised cassettes clearly confirm that efficiency of gene transfer and biological activity of the expressed transgenic proteins in the transduced cells can be achieved. Furthermore, Computational analysis was carried out by molecular dynamics (MD) simulation to determine the most emerges as viable containing specific binding site and bridging of 5′ and 3′ ends involving direct RNA-RNA contacts and RNA-protein interactions. These results provide a mechanistic basis for translation stimulation and RNA resembling for the synergistic stimulation of cap-dependent translation. Keywords: Internal ribosome entry site, IRES, Gene therapy, Inter-cistronic sequences, MD simulation, Protein structure modeling, Familial hypercholesterolemia
- Published
- 2018
17. De novo main-chain modeling with MAINMAST in 2015/2016 EM Model Challenge
- Author
-
Daisuke Kihara and Genki Terashi
- Subjects
0301 basic medicine ,Map interpretation ,Computer science ,Protein Conformation ,Minimum spanning tree ,Article ,Interpretation (model theory) ,03 medical and health sciences ,0302 clinical medicine ,Software ,Chain (algebraic topology) ,Structural Biology ,Position (vector) ,MAINMAST ,Rosetta ,Electron microscopy ,Confidence score ,Protocol (object-oriented programming) ,Cryo-EM ,business.industry ,Cryoelectron Microscopy ,Proteins ,Longest path problem ,030104 developmental biology ,Main-chain trace ,CryoEM Model Challenge ,Protein structure modeling ,confidence score ,business ,Algorithm ,030217 neurology & neurosurgery ,Mean shifting algorithm - Abstract
Protein tertiary structure modeling is a critical step for the interpretation of three dimensional (3D) election microscopy density. Our group participated the 2015/2016 EM Model Challenge using the MAINMAST software for a de novo main chain modeling. The software generates local dense points using the mean shifting algorithm, and connects them into Cα models by calculating the minimum spanning tree and the longest path. Subsequently, full atom structure models are generated, which are subject to structural refinement. Here, we summarize the qualities of our submitted models and examine successful and unsuccessful models, including 3D models we did not submit to the Challenge. Our protocol using the MAINMAST software was sometimes able to build correct conformations with 3.4–5.1 A RMSD. Unsuccessful models had failure of chain traces, however, their Cα positions and some local structures were quite correctly built. For evaluate the quality of the models, the MAINMAST software provides a confidence score for each Cα position from the consensus of top 100 scoring models.
- Published
- 2018
18. Protein Tertiary Structure by Crosslinking/Mass Spectrometry
- Author
-
Michael Schneider, Adam Belsom, and Juri Rappsilber
- Subjects
0301 basic medicine ,030102 biochemistry & molecular biology ,Chemistry ,technology, industry, and agriculture ,Proteins ,Computational biology ,Mass spectrometry ,Biochemistry ,Article ,Mass Spectrometry ,Protein tertiary structure ,Protein Structure, Tertiary ,03 medical and health sciences ,Cross-Linking Reagents ,030104 developmental biology ,Protein structure ,Structural biology ,Humans ,Protein structure modeling ,Molecular Biology - Abstract
Observing the structures of proteins within the cell and tracking structural changes under different cellular conditions are the ultimate challenges for structural biology. This, however, requires an experimental technique that can generate sufficient data for structure determination and is applicable in the native environment of proteins. Crosslinking/mass spectrometry (CLMS) and protein structure determination have recently advanced to meet these requirements and crosslinking-driven de novo structure determination in native environments is now possible. In this opinion article, we highlight recent successes in the field of CLMS with protein structure modeling and challenges it still holds., Highlights The earliest structural studies on proteins using crosslinking/mass spectrometry aimed to elucidate their tertiary three-dimensional structure. Tertiary structure modeling using crosslinking fell out of favor for almost two decades because crosslink data were not informative to aid structure modeling. Two game-changing trends emerged: using short-range crosslinkers that capture relevant modeling information and high-density crosslinking. High-density crosslinking uses unspecific crosslinkers to dramatically increase crosslink numbers. In addition, computational structure modeling methods made significant progress in exploiting CLMS data. The combination of high-density crosslinking and computational structure modeling enables the elucidation of tertiary protein structure in native environments. This sidesteps the key limitation of today’s structure determination methods, which are unable (except for a few, specialized methods) to probe the structure of proteins in cell lysates or even intact cells.
- Published
- 2018
19. Computational Study of Oryza sativa Germin Like Protein 1 (OsRGLP1), from Genome Sequence to Protein Structure; Modeling and Interaction
- Author
-
Dure Shahwar
- Subjects
0301 basic medicine ,Whole genome sequencing ,03 medical and health sciences ,030104 developmental biology ,Oryza sativa ,Computational biology ,Biology ,General Agricultural and Biological Sciences ,Protein structure modeling - Published
- 2018
20. MAINMAST: de novo protein structure modeling for cryo-EM maps assisted by structure feature detection by deep learning
- Author
-
Genki Terashi, Xiao Wang, and Daisuke Kihara
- Subjects
business.industry ,Computer science ,Cryo-electron microscopy ,Deep learning ,Pattern recognition ,Condensed Matter Physics ,Biochemistry ,Inorganic Chemistry ,Structural Biology ,General Materials Science ,Artificial intelligence ,Physical and Theoretical Chemistry ,business ,Protein structure modeling ,Feature detection (computer vision) - Published
- 2021
21. Identification of Genetic and Protein Markers in Salmonella enterica serovar Typhimurium by Bioinformatic Analyses for the Purpose of Diagnosis and Treatment
- Author
-
Mojdeh Amandadi, Hadi Ravan, and Mehdi Hassanshahian
- Subjects
Salmonella typhimurium ,Protein structure modeling ,Specific markers ,Microbiology ,QR1-502 - Abstract
Background and Aims: Salmonella enterica serovar Typhimurium is one of the common causes of food poisoning in human. Since the selection of appropriate markers is one of the main challenges for the detection of this pathogen, in the current study, genetic markers of this serovar were screened using bioinformatical tools. In the second phase, structure and function of proteins encoded by these markers, were determined. Materials and Methods: This study was conducted between 2016 and 2017. In order to find the genetic markers of Salmonella enterica serovar Typhimurium, 45 complete genomes belonging to Salmonella enterica serovar Typhimurium and the other genera of Enterobacteriacea family were compared using Mauve software. To determine the structure and function of proteins encoded by these sequences, I-TASSER and Phyre2 software beside CDD, Inter Pro Scan, DALI, and Pro Func databases were used for structural and functional modeling, respectively. Results: Special regions of STM4491-STM4496 genes were determined as specific markers for Salmonella enterica serovar Typhimurium. The function of proteins encoded by these markers were proposed to be classified in five groups, including Lon protease, nucleotide binding proteins, nucleotide three phosphatases (NTP), proteins involved in the DNA repair , and DNA methylase. Conclusions: Specific regions of STM4491-STM4496 genes can be used as effective diagnostic targets for the detection of pathogenic Salmonella enterica serovar Typhimurium. Moreover, proteins encoded by these genes can be suggested as suitable targets for the design of new therapeutic agents to prevent and treat the infections caused by this pathogen.
- Published
- 2017
22. Genomic Characteristics of Gender Dysphoria Patients and Identification of Rare Mutations in RYR3 Gene
- Author
-
Yixuan Ji, Haixia Ding, Wen Li, Xiao-hai Zhu, Fu Yang, Qing Zhang, Bang Xiao, Ningxia Sun, Jin-zhao Ma, and Shuhan Sun
- Subjects
0301 basic medicine ,Nonsynonymous substitution ,Gender dysphoria ,Adult ,Male ,China ,Science ,Biology ,medicine.disease_cause ,Article ,03 medical and health sciences ,0302 clinical medicine ,Rare mutations ,Exome Sequencing ,medicine ,Humans ,Genetic Predisposition to Disease ,Gender Dysphoria ,Gene ,Genetics ,Mutation ,Multidisciplinary ,Whole Genome Sequencing ,Gene ontology ,Genetic variants ,Computational Biology ,Ryanodine Receptor Calcium Release Channel ,medicine.disease ,Models, Structural ,030104 developmental biology ,Case-Control Studies ,Medicine ,Female ,Protein structure modeling ,030217 neurology & neurosurgery ,Transsexualism - Abstract
Gender dysphoria (GD) is characterized by an incongruence between the gender assigned at birth and the gender with which one identifies. The biological mechanisms of GD are unclear. While common genetic variants are associated with GD, positive findings have not always been replicated. To explore the role of rare variants in GD susceptibility within the Han Chinese population, whole-genome sequencing of 9 Han female-to-male transsexuals (FtMs) and whole-exome sequencing of 4 Han male-to-female transsexuals (MtFs) were analyzed using a pathway burden analysis in which variants are first collapsed at the gene level and then by Gene Ontology terms. Novel nonsynonymous variants in ion transport genes were significantly enriched in FtMs (P- value, 2.41E-10; Fold enrichment, 2.8) and MtFs (P- value, 1.04E-04; Fold enrichment, 2.3). Gene burden analysis comparing 13 GD cases and 100 controls implicated RYR3, with three heterozygous damaging mutations in unrelated FtMs and zero in controls (P = 0.001). Importantly, protein structure modeling of the RYR3 mutations indicated that the R1518H mutation made a large structural change in the RYR3 protein. Overall, our results provide information about the genetic basis of GD.
- Published
- 2017
23. Assessing the accuracy of contact predictions in CASP13
- Author
-
Rojan Shrestha, Eduardo Fajardo, Andras Fiser, Krzysztof Fidelis, Nelson Gil, Andriy Kryshtafovych, and Bohdan Monastyrskyy
- Subjects
0303 health sciences ,Sequence ,Artificial neural network ,business.industry ,Computer science ,030302 biochemistry & molecular biology ,Machine learning ,computer.software_genre ,Biochemistry ,Article ,03 medical and health sciences ,Structural Biology ,Artificial intelligence ,Evolutionary information ,business ,Protein structure modeling ,Molecular Biology ,computer ,030304 developmental biology - Abstract
The accuracy of sequence-based tertiary contact predictions was assessed in a blind prediction experiment at the CASP13 meeting. After four years of significant improvements in prediction accuracy, another dramatic advance has taken place since CASP12 was held two years ago. The precision of predicting the top L/5 contacts in the free modeling category, where L is the corresponding length of the protein in residues, has exceeded 70%. As a comparison, the best-performing group at CASP12 with a 47% precision would have finished below the top 1/3 of the CASP13 groups. Extensively trained deep neural network approaches dominate the top performing algorithms, which appear to efficiently integrate information on co-evolving residues and interacting fragments or possibly utilize memories of sequence similarities and sometimes can deliver accurate results even in the absence of virtually any target specific evolutionary information. If the current performance is evaluated by F-score on L contacts, it stands around 24% right now, which, despite the tremendous impact and advance in improving its utility for structure modeling, also suggests that there is much room left for further improvement.
- Published
- 2019
24. Path-LZerD: Predicting Assembly Order of Multimeric Protein Complexes
- Author
-
Charles Christoffer, Daisuke Kihara, and Genki Terashi
- Subjects
0303 health sciences ,Multiprotein complex ,Chemistry ,030303 biophysics ,Complex formation ,Protein–protein interaction ,03 medical and health sciences ,Docking (molecular) ,Ppi network ,Molecular mechanism ,Macromolecular docking ,Biological system ,Protein structure modeling ,030304 developmental biology - Abstract
Many important functions in a cell are carried out by protein complexes with more than two subunits. Similar to the folding of a single protein, multimeric protein complexes in general follow an energetically favored assembly path. Knowing the assembly path not only provides critical information about the molecular mechanism of the assembly but also serves as a foundation for artificial design of protein complexes, as well as development of drugs that interfere with complex formation. There are experimental approaches for determining the assembly path of a complex; however, such methods are resource intensive. We have recently developed a computational method, Path-LZerD, which predicts the assembly path of a complex by simulating the docking process of the complex. Here, we explain how to use the Path-LZerD software with examples.
- Published
- 2019
25. OPUS-Rota2: An Improved Fast and Accurate Side-Chain Modeling Method
- Author
-
Qinghua Wang, Tianqi Ma, Junqing Du, Jianpeng Ma, and Gang Xu
- Subjects
Models, Molecular ,Work (thermodynamics) ,010304 chemical physics ,Basis (linear algebra) ,Computer science ,Protein Conformation ,Proteins ,Protein structure prediction ,01 natural sciences ,Computer Science Applications ,Orders of magnitude (time) ,Chain (algebraic topology) ,Test set ,0103 physical sciences ,Side chain ,Physical and Theoretical Chemistry ,Protein structure modeling ,Algorithm ,Hydrophobic and Hydrophilic Interactions ,Algorithms - Abstract
Side-chain modeling plays a critical role in protein structure prediction. However, in many current methods, balancing the speed and accuracy is still challenging. In this paper, on the basis of our previous work OPUS-Rota (Protein Sci. 2008, 17, 1576–1585), we introduce a new side-chain modeling method, OPUS-Rota2, which is tested on both a 65-protein test set (DB65) in the OPUS-Rota paper and a 379-protein test set (DB379) in the SCWRL4 paper. If the main chain is native, OPUS-Rota2 is more accurate than OPUS-Rota, SCWRL4, and OSCAR-star but slightly less accurate than OSCAR-o. Also, if the main chain is non-native, OPUS-Rota2 is more accurate than any other method. Moreover, OPUS-Rota2 is significantly faster than any other method, in particular, 2 orders of magnitude faster than OSCAR-o. Thus, the combination of higher accuracy and speed of OPUS-Rota2 in modeling side chains on both the native and non-native main chains makes OPUS-Rota2 a very useful tool in protein structure modeling.
- Published
- 2019
26. The haustorial transcriptome of the cucurbit pathogen Podosphaera xanthii reveals new insights into the biotrophy and pathogenesis of powdery mildew fungi
- Author
-
Álvaro Polonio, M. Gonzalo Claros, Pedro Seoane, and Alejandro Pérez-García
- Subjects
0106 biological sciences ,DNA, Complementary ,lcsh:QH426-470 ,Protein Conformation ,lcsh:Biotechnology ,Genomics ,Biology ,01 natural sciences ,Transcriptome ,Fungal Proteins ,03 medical and health sciences ,Podosphaera xanthii ,Ascomycota ,Cucurbita ,Powdery mildew fungi ,Haustorium ,lcsh:TP248.13-248.65 ,Gene Expression Regulation, Fungal ,Genetics ,Massive-scale RNA sequencing ,030304 developmental biology ,Secretome ,Plant Diseases ,0303 health sciences ,Obligate ,Effector ,food and beverages ,Flow Cytometry ,lcsh:Genetics ,MRNA Sequencing ,Gene Ontology ,Host-Pathogen Interactions ,RNA extraction ,Protein structure modeling ,Powdery mildew ,010606 plant biology & botany ,Biotechnology ,Research Article - Abstract
Background Podosphaera xanthii is the main causal agent of powdery mildew disease in cucurbits and is responsible for important yield losses in these crops worldwide. Powdery mildew fungi are obligate biotrophs. In these parasites, biotrophy is determined by the presence of haustoria, which are specialized structures of parasitism developed by these fungi for the acquisition of nutrients and the delivery of effectors. Detailed molecular studies of powdery mildew haustoria are scarce due mainly to difficulties in their isolation. Therefore, their analysis is considered an important challenge for powdery mildew research. The aim of this work was to gain insights into powdery mildew biology by analysing the haustorial transcriptome of P. xanthii. Results Prior to RNA isolation and massive-scale mRNA sequencing, a flow cytometric approach was developed to isolate P. xanthii haustoria free of visible contaminants. Next, several commercial kits were used to isolate total RNA and to construct the cDNA and Illumina libraries that were finally sequenced by the Illumina NextSeq system. Using this approach, the maximum amount of information from low-quality RNA that could be obtained was used to accomplish the de novo assembly of the P. xanthii haustorial transcriptome. The subsequent analysis of this transcriptome and comparison with the epiphytic transcriptome allowed us to identify the importance of several biological processes for haustorial cells such as protection against reactive oxygen species, the acquisition of different nutrients and genetic regulation mediated by non-coding RNAs. In addition, we could also identify several secreted proteins expressed exclusively in haustoria such as cell adhesion proteins that have not been related to powdery mildew biology to date. Conclusions This work provides a novel approach to study the molecular aspects of powdery mildew haustoria. In addition, the results of this study have also allowed us to identify certain previously unknown processes and proteins involved in the biology of powdery mildews that could be essential for their biotrophy and pathogenesis. Electronic supplementary material The online version of this article (10.1186/s12864-019-5938-0) contains supplementary material, which is available to authorized users.
- Published
- 2019
27. Assessment of chemical-crosslink-assisted protein structure modeling in CASP13
- Author
-
Bohdan Monastyrskyy, Juri Rappsilber, Krzysztof Fidelis, J. Eduardo Fajardo, Mikhail Karasikov, Alexander Leitner, Agnieszka S. Karczyńska, Andras Fiser, Adam Belsom, Silvia Crivelli, Emilia A. Lubecka, Celina Sikorska, Andriy Kryshtafovych, Nelson Gil, Cezary Czaplewski, Rojan Shrestha, Esben Trabjerg, Sergei Grudinin, Guillaume Pagès, Adam Liwo, Adam K. Sieradzan, Department of Systems & Computational Biology [New York], Albert Einstein College of Medicine [New York], Department of Biochemistry [New York], Institut für Biotechnologie [berlin], Technische Universität Berlin (TU), Department of Computer Science [Davis] (UC Davis), University of California [Davis] (UC Davis), University of California-University of California, Department of Environmental Analytics [Univ Gdańsk], Faculty of Chemistry [Univ Gdańsk], University of Gdańsk (UG)-University of Gdańsk (UG), Genome Center [UC Davis], Algorithms for Modeling and Simulation of Nanosystems (NANO-D), Inria Grenoble - Rhône-Alpes, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Laboratoire Jean Kuntzmann (LJK ), Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Université Grenoble Alpes [2016-2019] (UGA [2016-2019])-Centre National de la Recherche Scientifique (CNRS)-Université Grenoble Alpes [2016-2019] (UGA [2016-2019]), Moscow Institute of Physics and Technology [Moscow] (MIPT), Skolkovo Institute of Science and Technology [Moscow] (Skoltech), Department of Computer Science [ETH Zürich] (D-INFK), Eidgenössische Technische Hochschule - Swiss Federal Institute of Technology [Zürich] (ETH Zürich), Institute of Molecular Systems Biology [Zurich], Korea Institute for Advanced Study (KIAS), Institute of Mathematics [Univ Gdańsk], University of Gdańsk (UG), Wellcome Trust Centre for Cell Biology, University of Edinburgh, Technical University of Berlin / Technische Universität Berlin (TU), Department of Computer Science [Univ California Davis] (CS - UC Davis), University of California (UC)-University of California (UC), Algorithms for Modeling and Simulating Nanosystems [2018-...] (NANO-D-POST [2018-2020]), and Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)
- Subjects
CASP13 ,Models, Molecular ,Computer science ,Protein Conformation ,Context (language use) ,Computational biology ,Routine practice ,010402 general chemistry ,Mass spectrometry ,Machine learning ,computer.software_genre ,01 natural sciences ,Biochemistry ,Article ,[PHYS.PHYS.PHYS-COMP-PH]Physics [physics]/Physics [physics]/Computational Physics [physics.comp-ph] ,03 medical and health sciences ,Protein structure ,chemical-crosslink-assisted protein structure modeling ,Structural Biology ,Tandem Mass Spectrometry ,Molecular Biology ,030304 developmental biology ,0303 health sciences ,Chemistry ,business.industry ,030302 biochemistry & molecular biology ,Model protein ,Computational Biology ,Proteins ,Reproducibility of Results ,0104 chemical sciences ,[SDV.BBM.BP]Life Sciences [q-bio]/Biochemistry, Molecular Biology/Biophysics ,Workflow ,Cross-Linking Reagents ,Models, Chemical ,chemical crosslinking/mass spectrometry ,Critical assessment ,Artificial intelligence ,[INFO.INFO-BI]Computer Science [cs]/Bioinformatics [q-bio.QM] ,business ,Protein structure modeling ,computer ,Function (biology) ,Algorithms ,Chromatography, Liquid - Abstract
International audience; With the advance of experimental procedures obtaining chemical crosslinking information is becoming a fast and routine practice. Information on crosslinks can greatly enhance the accuracy of protein structure modeling. Here, we review the current state of the art in modeling protein structures with the assistance of experimentally determined chemical crosslinks within the framework of the 13th meeting of Critical Assessment of Structure Prediction approaches. This largest‐to‐date blind assessment reveals benefits of using data assistance in difficult to model protein structure prediction cases. However, in a broader context, it also suggests that with the unprecedented advance in accuracy to predict contacts in recent years, experimental crosslinks will be useful only if their specificity and accuracy further improved and they are better integrated into computational workflows.
- Published
- 2019
28. De Novo Protein Structure Modeling Tool MAINMAST Enhanced for Multiple Chain Complexes and Bound Ligands
- Author
-
Genki Terashi and Daisuke Kihara
- Subjects
Chain (algebraic topology) ,Chemistry ,Stereochemistry ,Biophysics ,Protein structure modeling - Published
- 2020
29. Loss of Function in Zeaxanthin Epoxidase of
- Author
-
Minjae, Kim, Jisu, Kang, Yongsoo, Kang, Beom Sik, Kang, and EonSeon, Jin
- Subjects
Models, Molecular ,Algal Proteins ,protein structure modeling ,Article ,zeaxanthin epoxidase ,Chlorophyta ,Loss of Function Mutation ,Zeaxanthins ,Catalytic Domain ,Dunaliella tertiolecta ,Microalgae ,Point Mutation ,Amino Acid Sequence ,marine microalga ,Oxidoreductases - Abstract
The zea1 mutant of marine microalga Dunaliella tertiolecta accumulates zeaxanthin under normal growth conditions, and its phenotype has been speculated to be related to zeaxanthin epoxidase (ZEP). In this study, we isolated the ZEP gene from both wild-type D. tertiolecta and the mutant. We found that the zea1 mutant has a point mutation of the 1337th nucleotide of the ZEP sequence (a change from guanine to adenine), resulting in a change of glycine to aspartate in a highly conserved region in the catalytic domain. Similar expression levels of ZEP mRNA and protein in both wild-type and zea1 were confirmed by using qRT-PCR and western blot analysis, respectively. Additionally, the enzyme activity analysis of ZEPs in the presence of cofactors showed that the inactivation of ZEP in zea1 was not caused by deficiency in the levels of cofactors. From the predicted three-dimensional ZEP structure of zea1, we observed a conformational change on the substrate-binding site in the ZEP. A comparative analysis of the ZEP structures suggested that the conformational change induced by a single amino acid mutation might impact the interaction between the substrate and substrate-binding site, resulting in loss of zeaxanthin epoxidase function.
- Published
- 2018
30. Evaluation system and web infrastructure for the second cryo-EM model challenge
- Author
-
Andriy Kryshtafovych, Wah Chiu, Paul D. Adams, and Catherine L. Lawson
- Subjects
Models, Molecular ,0301 basic medicine ,Model challenge ,Evaluation system ,Interface (Java) ,Computer science ,Protein Conformation ,Biophysics ,Bioengineering ,computer.software_genre ,Article ,03 medical and health sciences ,Similarity (network science) ,Structural Biology ,Models ,Cryo-EM ,Structure (mathematical logic) ,Suite ,Cryoelectron Microscopy ,Proteins ,Molecular ,Visualization ,030104 developmental biology ,Networking and Information Technology R&D (NITRD) ,Data mining ,Protein structure modeling ,Biochemistry and Cell Biology ,computer ,Zoology ,Protein structure verification - Abstract
An evaluation system and a web infrastructure were developed for the second cryo-EM model challenge. The evaluation system includes tools to validate stereo-chemical plausibility of submitted models, check their fit to the corresponding density maps, estimate their overall and per-residue accuracy, and assess their similarity to reference cryo-EM or X-ray structures as well as other models submitted in this challenge. The web infrastructure provides a convenient interface for analyzing models at different levels of detail. It includes interactively sortable tables of evaluation scores for different subsets of models and different sublevels of structure organization, and a suite of visualization tools facilitating model analysis. The results are publicly accessible at http://model-compare.emdatabank.org.
- Published
- 2018
31. CONFOLD2: improved contact-driven ab initio protein structure modeling
- Author
-
Jianlin Cheng and Badri Adhikari
- Subjects
0301 basic medicine ,Protein Folding ,Fold (higher-order function) ,Protein Conformation ,Contacts ,Computer science ,030303 biophysics ,Ab initio ,lcsh:Computer applications to medicine. Medical informatics ,computer.software_genre ,Model selection ,Biochemistry ,User-Computer Interface ,03 medical and health sciences ,Structural Biology ,Databases, Protein ,lcsh:QH301-705.5 ,Molecular Biology ,030304 developmental biology ,0303 health sciences ,Internet ,Sequence ,030102 biochemistry & molecular biology ,Applied Mathematics ,Proteins ,A protein ,Function (mathematics) ,Protein structure prediction ,Computer Science Applications ,030104 developmental biology ,lcsh:Biology (General) ,CONFOLD ,lcsh:R858-859.7 ,Protein folding ,Data mining ,DNA microarray ,Protein structure modeling ,computer ,Algorithm ,Algorithms ,Software - Abstract
BackgroundContact-guided protein structure prediction methods are becoming more and more successful because of the latest advances in residue-residue contact prediction. To support the contact-driven structure prediction, effective tools that can quickly build tertiary structural models of good quality from predicted contacts need to be developed.ResultsWe develop an improved contact-driven protein modeling method, CONFOLD2, and study how it may be effectively used for ab initio protein structure prediction with predicted contacts as input. It builds models using various subsets of input contacts to explore the fold space under the guidance of a soft square energy function, and then clusters the models to obtain top five models. CONFOLD2 is benchmarked on various datasets including CASP11 and 12 datasets with publicly available predicted contacts and yields better performance than the popular CONFOLD method.ConclusionCONFOLD2 allows to quickly generate top five structural models for a protein sequence, when its secondary structures and contacts predictions at hand. CONFOLD2 is publicly available at https://github.com/multicom-toolbox/CONFOLD2/.
- Published
- 2018
32. Modeling the assembly order of multimeric heteroprotein complexes
- Author
-
Genki Terashi, Juan Esquivel-Rodríguez, Lenna X. Peterson, A. Roy, Woong-Hee Shin, Charles Christoffer, Yoichiro Togawa, and Daisuke Kihara
- Subjects
0301 basic medicine ,Computer science ,Complex formation ,Protein Structure Prediction ,Biochemistry ,Molecular Docking Simulation ,0302 clinical medicine ,Computational Chemistry ,Protein structure ,Protein Interaction Mapping ,Macromolecular Structure Analysis ,Databases, Protein ,lcsh:QH301-705.5 ,Free Energy ,0303 health sciences ,Ecology ,Physics ,Protein structure prediction ,Chemistry ,Order (biology) ,Computational Theory and Mathematics ,Modeling and Simulation ,Physical Sciences ,Molecular Mechanics ,Thermodynamics ,Experimental methods ,Protein Structure Determination ,Biological system ,Algorithms ,Protein Binding ,Research Article ,Cholera Toxin ,Protein Structure ,Multiprotein complex ,Chemical physics ,Protein subunit ,Protein domain ,Biophysics ,Protein–protein interaction ,03 medical and health sciences ,Cellular and Molecular Neuroscience ,Protein Domains ,Genetics ,Humans ,Protein Interactions ,Molecular Biology ,Ecology, Evolution, Behavior and Systematics ,030304 developmental biology ,Models, Statistical ,Helicobacter pylori ,Computational Biology ,Proteins ,Biology and Life Sciences ,Protein Complexes ,Dimers (Chemical physics) ,030104 developmental biology ,lcsh:Biology (General) ,Protein-Protein Interactions ,Docking (molecular) ,Path (graph theory) ,Protein structure modeling ,030217 neurology & neurosurgery ,Software - Abstract
Protein-protein interactions are the cornerstone of numerous biological processes. Although an increasing number of protein complex structures have been determined using experimental methods, relatively fewer studies have been performed to determine the assembly order of complexes. In addition to the insights into the molecular mechanisms of biological function provided by the structure of a complex, knowing the assembly order is important for understanding the process of complex formation. Assembly order is also practically useful for constructing subcomplexes as a step toward solving the entire complex experimentally, designing artificial protein complexes, and developing drugs that interrupt a critical step in the complex assembly. There are several experimental methods for determining the assembly order of complexes; however, these techniques are resource-intensive. Here, we present a computational method that predicts the assembly order of protein complexes by building the complex structure. The method, named Path-LzerD, uses a multimeric protein docking algorithm that assembles a protein complex structure from individual subunit structures and predicts assembly order by observing the simulated assembly process of the complex. Benchmarked on a dataset of complexes with experimental evidence of assembly order, Path-LZerD was successful in predicting the assembly pathway for the majority of the cases. Moreover, when compared with a simple approach that infers the assembly path from the buried surface area of subunits in the native complex, Path-LZerD has the strong advantage that it can be used for cases where the complex structure is not known. The path prediction accuracy decreased when starting from unbound monomers, particularly for larger complexes of five or more subunits, for which only a part of the assembly path was correctly identified. As the first method of its kind, Path-LZerD opens a new area of computational protein structure modeling and will be an indispensable approach for studying protein complexes., Author summary Protein-protein interactions, particularly those involving multiple proteins, are the cornerstone of numerous biological processes. Although an increasing number of multi-chain protein complex structures have been determined, fewer studies have been performed to determine the assembly order of complexes. Knowing the assembly order of a complex provides insights into the process of complex formation. Assembly order is also practically useful for reconstructing and determining the structure of a subcomplex of a large protein complex. It also has important applications including designing artificial protein complexes and drugs that prevent the assembly of protein complexes. We present a computational method, Path-LZerD, which predicts the assembly order of a protein complex by simulating its assembly process. This is the first method of this kind. A strong advantage of Path-LZerD is that the assembly order can be predicted even when the overall complex structure is not known. Path-LZerD opens a new area of computational protein structure modeling and will be an indispensable approach for studying protein complexes.
- Published
- 2018
33. Comprehensive identification of disulfide bonds using non-specific proteinase K digestion and CID-cleavable crosslinking analysis methodology for Orbitrap LC/ESI-MS/MS data
- Author
-
Jason J. Serpa, Evgeniy V. Petrotchenko, Karl A.T. Makepeace, and Christoph H. Borchers
- Subjects
Spectrometry, Mass, Electrospray Ionization ,Lc esi ms ms ,macromolecular substances ,Mass spectrometry ,Orbitrap ,General Biochemistry, Genetics and Molecular Biology ,law.invention ,03 medical and health sciences ,Digestion (alchemy) ,Non specific ,Tandem Mass Spectrometry ,law ,Humans ,Cysteine ,Disulfides ,Molecular Biology ,030304 developmental biology ,0303 health sciences ,Chromatography ,biology ,Chemistry ,030302 biochemistry & molecular biology ,technology, industry, and agriculture ,Disulfide bond ,Proteinase K ,Cross-Linking Reagents ,biology.protein ,Endopeptidase K ,Protein structure modeling ,Chromatography, Liquid - Abstract
Disulfide bonds are valuable constraints in protein structure modeling. The Cys–Cys disulfide bond undergoes specific fragmentation under CID and, therefore, can be considered as a CID-cleavable crosslink. We have recently reported on the benefits of using non-specific digestion with proteinase K for inter-peptide crosslink determination. Here, we describe an updated application of our CID-cleavable crosslink analysis software and our crosslinking analysis with non-specific digestion methodology for the robust and comprehensive determination of disulfide bonds in proteins, using Orbitrap LC/ESI-MS/MS data.
- Published
- 2015
34. ICOSA: A Distance-Dependent, Orientation-Specific Coarse-Grained Contact Potential for Protein Structure Modeling
- Author
-
Wessam Elhefnawy, Yaohang Li, Yun Han, and Lin Chen
- Subjects
Models, Molecular ,Protein Conformation ,Icosahedral symmetry ,Knowledge Bases ,010402 general chemistry ,01 natural sciences ,03 medical and health sciences ,Sequence Analysis, Protein ,Structural Biology ,Protein Interaction Mapping ,Local coordinates ,Protein Interaction Maps ,Statistical physics ,Caspase 10 ,Molecular Biology ,030304 developmental biology ,0303 health sciences ,Chemistry ,Computational Biology ,Ideal gas ,0104 chemical sciences ,Crystallography ,Protein folding ,Protein structure modeling ,Algorithms ,Software - Abstract
The relative distance and orientation in contacting residue pairs plays a significant role in protein folding and stabilization. We hereby devise a new knowledge-based, coarse-grained contact potential, so-called ICOSA, by correlating inter-residue contact distance and orientation in evaluating pair-wise inter-residue interactions. The rationale of our approach is to establish icosahedral local coordinates to estimate the statistical residue contact distributions in all spherical triangular shells within a sphere. We extend the theory of finite ideal gas reference state to icosahedral local coordinates. ICOSA incorporates long-range contact interactions, which is critical to ICOSA sensitivity and is justified in statistical rigor. With only backbone atoms information, ICOSA is at least comparable to all-atom, fine-grained potentials such as Rosetta, DFIRE, I-TASSER, and OPUS in discriminating near-natives from misfold protein conformations in the Rosetta and I-TASSER protein decoy sets. ICOSA also outperforms a set of widely used coarse-grained potentials and is comparable to all-atom, fine-grained potentials in identifying CASP10 models.
- Published
- 2015
35. Prediction of the structures of helical membrane proteins based on a minimum unfavorable contacts approach
- Author
-
Siladitya Padhi, U. Deva Priyakumar, and Siddabattula Ramakrishna
- Subjects
Models, Molecular ,Protein Conformation ,Chemistry ,Membrane Proteins ,Sampling (statistics) ,General Chemistry ,Computational Mathematics ,Crystallography ,Models, Chemical ,Membrane protein ,Computer Simulation ,Lipid bilayer ,Protein structure modeling ,Conformational sampling ,Biological system ,Native structure ,Software ,Ion channel - Abstract
An understanding of structure-function relationships of membrane proteins continues to be a challenging problem, owing to the difficulty in obtaining their structures experimentally. This study suggests a method for modeling membrane protein structures that can be used to generate a reliable initial conformation prior to the use of other approaches for sampling conformations. It involves optimizing the orientation of hydrophilic residues so as to minimize unfavorable contacts with the hydrophobic tails of the lipid bilayer. Starting with the optimized initial conformation for three different proteins modeled based on this method, two independent approaches have been used for sampling the conformational space of the proteins. Both approaches are able to predict structures reasonably close to experimental structures, indicating that the initial structure enables the sampling of conformations that are close to the native structure. Possible improvements in the method for making it broadly applicable to helical membrane proteins are discussed.
- Published
- 2015
36. Protein Structure Modeling
- Author
-
Hsueh-Fen Juan and Chia-Hsien Lee
- Subjects
Chemistry ,Biophysics ,Protein structure modeling - Published
- 2017
37. OPUS-CSF: A C-atom-based Scoring Function for Ranking Protein Structural Models
- Author
-
Qinghua Wang, Gang Xu, Jianpeng Ma, Tianwu Zang, and Tianqi Ma
- Subjects
Models, Molecular ,Correlation coefficient ,Protein Data Bank (RCSB PDB) ,010402 general chemistry ,Bioinformatics ,01 natural sciences ,protein structure modeling ,Ranking (information retrieval) ,03 medical and health sciences ,protein folding ,scoring function ,Boltzmann's entropy formula ,Databases, Protein ,030304 developmental biology ,Mathematics ,0303 health sciences ,Tools for Protein Science ,business.industry ,decoy recognition ,Proteins ,Pattern recognition ,Function (mathematics) ,coarse‐graining ,0104 chemical sciences ,Artificial intelligence ,business ,Decoy ,Software - Abstract
SummaryWe report a C-atom-based scoring function, named OPUS-CSF, for ranking protein structural models. Rather than using traditional Boltzmann formula, we built a scoring function (CSF score) based on the native distributions (analyzed through entire PDB) of coordinate components of mainchain C atoms on selected residues of peptide segments of 5, 7, 9, and 11 residues in length. In testing OPUS-CSF on decoy recognition, it maximally recognized 257 native structures out of 278 targets in 11 commonly used decoy sets, significantly more than other popular all-atom empirical potentials. The average correlation coefficient with TM-score was also comparable with those of other potentials. OPUS-CSF is a highly coarse-grained scoring function, which only requires input of partial mainchain information, and very fast. Thus it is suitable for applications at early stage of structural building.
- Published
- 2017
- Full Text
- View/download PDF
38. Modeling disordered protein interactions from biophysical principles
- Author
-
A. Roy, Genki Terashi, Daisuke Kihara, Lenna X. Peterson, and Charles Christoffer
- Subjects
Proteomics ,Models, Molecular ,0301 basic medicine ,Protein Structure Prediction ,Molecular Dynamics ,Biochemistry ,Database and Informatics Methods ,Mice ,Computational Chemistry ,Sequence Analysis, Protein ,Macromolecular Structure Analysis ,lcsh:QH301-705.5 ,Ecology ,Proteomic Databases ,Chemistry ,Physics ,Computational Theory and Mathematics ,Modeling and Simulation ,Physical Sciences ,Structural Proteins ,Protein Structure Determination ,Experimental methods ,Research Article ,Protein Binding ,Protein Structure ,Biophysics ,Computational biology ,Research and Analysis Methods ,Protein–protein interaction ,03 medical and health sciences ,Cellular and Molecular Neuroscience ,Genetics ,Animals ,Humans ,Amino Acid Sequence ,Binding site ,Molecular Biology ,Ecology, Evolution, Behavior and Systematics ,Binding Sites ,Biology and Life Sciences ,Proteins ,Computational Biology ,Protein tertiary structure ,Intrinsically Disordered Proteins ,Biological Databases ,030104 developmental biology ,lcsh:Biology (General) ,Docking (molecular) ,Protein structure modeling - Abstract
Disordered protein-protein interactions (PPIs), those involving a folded protein and an intrinsically disordered protein (IDP), are prevalent in the cell, including important signaling and regulatory pathways. IDPs do not adopt a single dominant structure in isolation but often become ordered upon binding. To aid understanding of the molecular mechanisms of disordered PPIs, it is crucial to obtain the tertiary structure of the PPIs. However, experimental methods have difficulty in solving disordered PPIs and existing protein-protein and protein-peptide docking methods are not able to model them. Here we present a novel computational method, IDP-LZerD, which models the conformation of a disordered PPI by considering the biophysical binding mechanism of an IDP to a structured protein, whereby a local segment of the IDP initiates the interaction and subsequently the remaining IDP regions explore and coalesce around the initial binding site. On a dataset of 22 disordered PPIs with IDPs up to 69 amino acids, successful predictions were made for 21 bound and 18 unbound receptors. The successful modeling provides additional support for biophysical principles. Moreover, the new technique significantly expands the capability of protein structure modeling and provides crucial insights into the molecular mechanisms of disordered PPIs., Author summary A substantial fraction of the proteins encoded in genomes are intrinsically disordered proteins (IDPs), which lack a single stable structure in the native state. IDPs serve many functions including mediating protein-protein interactions (PPIs). Such disordered PPIs are prevalent in important regulatory pathways, including many interactions of the tumor suppressor protein p53. To elucidate the molecular mechanisms of disordered PPIs, obtaining tertiary structure information is essential; however, they are difficult to study with experimental techniques and existing computational protein-protein and protein-peptide modeling methods are unable to model disordered PPIs. Here we present a novel computational method for modeling the structure of disordered PPIs, which is the first of this sort. The method, IDP-LZerD, is designed to follow a known biophysical picture of the mechanism of how IDPs interact with structured proteins. IDP-LZerD successfully modeled the majority of disordered PPIs tested. This technique opens up new possibilities for structural studies of IDPs and their interactions.
- Published
- 2017
39. Docking Covalent Inhibitors: A Parameter Free Approach To Pose Prediction and Scoring
- Author
-
Ramy Farid, Edward Harder, Jeremy R. Greenwood, Kenneth W. Borrelli, Tyler Day, Kai Zhu, and Robert Abel
- Subjects
Virtual screening ,Protein Conformation ,Computer science ,General Chemical Engineering ,General Chemistry ,Library and Information Sciences ,Crystallography, X-Ray ,Ligands ,Computer Science Applications ,Molecular Docking Simulation ,Structure-Activity Relationship ,Protein–ligand docking ,Computational chemistry ,Docking (molecular) ,Searching the conformational space for docking ,Covalent bond ,Drug Discovery ,Pose prediction ,Enzyme Inhibitors ,Protein structure modeling - Abstract
Although many popular docking programs include a facility to account for covalent ligands, large-scale systematic docking validation studies of covalent inhibitors have been sparse. In this paper, we present the development and validation of a novel approach for docking and scoring covalent inhibitors, which consists of conventional noncovalent docking, heuristic formation of the covalent attachment point, and structural refinement of the protein-ligand complex. This approach combines the strengths of the docking program Glide and the protein structure modeling program Prime and does not require any parameter fitting for the study of additional covalent reaction types. We first test this method by predicting the native binding geometry of 38 covalently bound complexes. The average RMSD of the predicted poses is 1.52 Å, and 76% of test set inhibitors have an RMSD of less than 2.0 Å. In addition, the apparent affinity score constructed herein is tested on a virtual screening study and the characterization of the SAR properties of two different series of congeneric compounds with satisfactory success.
- Published
- 2014
40. On the internal correlations of protein sequences probed by non-alignment methods: Novel signatures for drug and antibody targets via the Burrows-Wheeler Transform
- Author
-
Daniel J. Graham and Brian P. Robinson
- Subjects
Non alignment ,0303 health sciences ,Class (computer programming) ,Burrows–Wheeler transform ,Computer science ,business.industry ,Process Chemistry and Technology ,010401 analytical chemistry ,Model parameters ,computer.software_genre ,01 natural sciences ,0104 chemical sciences ,Computer Science Applications ,Analytical Chemistry ,03 medical and health sciences ,Software ,Database search engine ,Data mining ,Protein structure modeling ,business ,computer ,Spectroscopy ,030304 developmental biology ,Complement (set theory) - Abstract
It is of long-standing interest to probe protein primary structures by their codes alone with minimal assumptions and model parameters. Approaches over decades have looked to the mathematics of digital information, most prominently alignment and database search algorithms. We follow an alternative line of inquiry by directing the Burrows-Wheeler transform (BWT) to archetypal sequences and sets. The motivation overlaps with bioinformatics and pharmaceutical chemistry: to better comprehend protein structure information with applications to drug and antibody targets. The approach, however, does not concentrate on sequences per se, but rather their information-conserved transforms. We demonstrate how such transforms enable obscure primary structure correlations to rise to the surface. The methodology further leverages the assembly information in sequences to provide class databases for comparisons. The databases illuminate additional correlations that are nuanced and characteristic. We illustrate the workings of BWT followed by data for archetypal drug and antibody targets. Our purpose is to establish new metrics and signatures for screening targets that complement ones from bioinformatics and protein structure modeling. The programming and analysis are straightforward and well-accessible to data researchers. Further, the software is freely available from the authors on request.
- Published
- 2019
41. Critical assessment of methods of protein structure prediction (CASP) - round x
- Author
-
Andriy Kryshtafovych, Torsten Schwede, John Moult, Krzysztof Fidelis, and Anna Tramontano
- Subjects
0303 health sciences ,business.industry ,030302 biochemistry & molecular biology ,Biology ,Protein structure prediction ,Machine learning ,computer.software_genre ,Biochemistry ,Data science ,Field (computer science) ,03 medical and health sciences ,De novo protein structure prediction ,Structural Biology ,Modelling methods ,Prediction methods ,Critical assessment ,Artificial intelligence ,CASP ,business ,Protein structure modeling ,Molecular Biology ,computer ,030304 developmental biology - Abstract
This article is an introduction to the special issue of the journal PROTEINS, dedicated to the tenth Critical Assessment of Structure Prediction (CASP) experiment to assess the state of the art in protein structure modeling. The article describes the conduct of the experiment, the categories of prediction included, and outlines the evaluation and assessment procedures. The 10 CASP experiments span almost 20 years of progress in the field of protein structure modeling, and there have been enormous advances in methods and model accuracy in that period. Notable in this round is the first sustained improvement of models with refinement methods, using molecular dynamics. For the first time, we tested the ability of modeling methods to make use of sparse experimental three-dimensional contact information, such as may be obtained from new experimental techniques, with encouraging results. On the other hand, new contact prediction methods, though holding considerable promise, have yet to make an impact in CASP testing. The nature of CASP targets has been changing in recent CASPs, reflecting shifts in experimental structural biology, with more irregular structures, more multi-domain and multi-subunit structures, and less standard versions of known folds. When allowance is made for these factors, we continue to see steady progress in the overall accuracy of models, particularly resulting from improvement of non-template regions.
- Published
- 2013
42. CASP prediction center infrastructure and evaluation measures in CASP10 and CASP ROLL
- Author
-
Bohdan Monastyrskyy, Krzysztof Fidelis, and Andriy Kryshtafovych
- Subjects
Structural Biology ,Quality assessment ,Computer science ,Data mining ,Protein structure prediction ,CASP ,computer.software_genre ,Protein structure modeling ,Molecular Biology ,Biochemistry ,computer ,Information exchange ,Visualization - Abstract
The Protein Structure Prediction Center at the University of California, Davis, supports the CASP experiments by identifying prediction targets, accepting predictions, performing standard evaluations, assisting independent CASP assessors, presenting and archiving results, and facilitating information exchange relating to CASP and structure prediction in general. We provide an overview of the CASP infrastructure implemented at the Center, and summarize standard measures used for evaluating predictions in the latest round of CASP. Several components were introduced or significantly redesigned for CASP10, in particular an improved assessors' common web-workspace; a Sphere Grinder visualization tool for analyzing local accuracy of predictions; brand new blocks for evaluation contact prediction and contact-assisted structure prediction; expanded evaluation and visualization tools for tertiary structure, refinement and quality assessment. Technical aspects of conducting the CASP10 and CASP ROLL experiments and relevant statistics are also provided. Proteins 2014; 82(Suppl 2):7–13. © 2013 Wiley Periodicals, Inc.
- Published
- 2013
43. One contact for every twelve residues allows robust and accurate topology-level protein structure modeling
- Author
-
Frank DiMaio, Ray Yu-Ruei Wang, David Baker, David E. Kim, and Yifan Song
- Subjects
Protein structure ,Global distance test ,Structural Biology ,Chemistry ,Homology modeling ,Loop modeling ,Protein structure prediction ,Protein structure modeling ,Topology ,Ab initio prediction ,Molecular Biology ,Biochemistry - Abstract
A number of methods have been described for identifying pairs of contacting residues in protein three-dimensional structures, but it is unclear how many contacts are required for accurate structure modeling. The CASP10 assisted contact experiment provided a blind test of contact guided protein structure modeling. We describe the models generated for these contact guided prediction challenges using the Rosetta structure modeling methodology. For nearly all cases, the submitted models had the correct overall topology, and in some cases, they had near atomic-level accuracy; for example the model of the 384 residue homo-oligomeric tetramer (Tc680o) had only 2.9 A root-mean-square deviation (RMSD) from the crystal structure. Our results suggest that experimental and bioinformatic methods for obtaining contact information may need to generate only one correct contact for every 12 residues in the protein to allow accurate topology level modeling.
- Published
- 2013
44. The utility of artificially evolved sequences in protein threading and fold recognition
- Author
-
Michal Brylinski
- Subjects
Models, Molecular ,Statistics and Probability ,Protein Folding ,Biology ,Bioinformatics ,General Biochemistry, Genetics and Molecular Biology ,Databases, Protein ,General Immunology and Microbiology ,business.industry ,Applied Mathematics ,Proteins ,Pattern recognition ,General Medicine ,Protein structure prediction ,Template library ,Sensor fusion ,Sequence identity ,Template ,Modeling and Simulation ,Artificial intelligence ,Threading (protein sequence) ,General Agricultural and Biological Sciences ,business ,Protein structure modeling ,Sequence Alignment ,Functional genomics - Abstract
Template-based protein structure prediction plays an important role in Functional Genomics by providing structural models of gene products, which can be utilized by structure-based approaches to function inference. From a systems level perspective, the high structural coverage of gene products in a given organism is critical. Despite continuous efforts towards the development of more sensitive threading approaches, confident structural models cannot be constructed for a considerable fraction of proteins due to difficulties in recognizing low-sequence identity templates with a similar fold to the target. Here we introduce a new modeling stratagem, which employs a library of synthetic sequences to improve template ranking in fold recognition by sequence profile-based methods. We developed a new method for the optimization of generic protein-like amino acid sequences to stabilize the respective structures using a combined empirical scoring function, which is compatible with these commonly used in protein threading and fold recognition. We show that the artificially evolved sequences, whose average sequence identity to the wild-type sequences is as low as 13.8%, have significant capabilities to recognize the correct structures. Importantly, the quality of the corresponding threading alignments is comparable to these constructed using conventional wild-type approaches (the average TM-score is 0.48 and 0.54, respectively). Fold recognition that uses data fusion to combine ranks calculated for both wild-type and synthetic template libraries systematically improves the detection of structural analogs. Depending on the threading algorithm used, it yields on average 4–16% higher recognition rates than using the wild-type template library alone. Synthetic sequences artificially evolved for the template structures provide an orthogonal source of signal that could be exploited to detect these templates unrecognized by standard modeling techniques. It opens up new directions in the development of more sensitive threading methods with the enhanced capabilities of targeting difficult, midnight zone templates.
- Published
- 2013
45. Coarse-Grained Contact Potential Helps Improve Fold Recognition Sensitivity in Template-Based Protein Structure Modeling
- Author
-
Yaohang Li and Maha Abdelrasoul
- Subjects
0301 basic medicine ,Computer science ,business.industry ,Protein Data Bank (RCSB PDB) ,A protein ,Pattern recognition ,computer.software_genre ,03 medical and health sciences ,030104 developmental biology ,Protein structure ,Data mining ,Template based ,Artificial intelligence ,Threading (protein sequence) ,business ,Protein structure modeling ,computer - Abstract
The recent growth of the number of experimentally-determined protein structures in Protein Data Banks (PDB) has provided hundreds of thousands of structural templates to support reliable template-based protein structure modeling. In this paper, taking advantage of large-scale data processing platforms, sequence alignments with a large number of potential structural templates can be completed rapidly. Moreover, we present a new approach of using ICOSA, a coarse-grained contact potential correlating inter-residue interaction distance and orientation, to estimate the favorability of the target sequence fitting in the fold topology of a protein structural template. Incorporating ICOSA score with sequence profile alignment score generated by the MUSTER program improves the sensitivity of identifying the most appropriate template from a large number of possible templates. The effectiveness of this template selection approach has been demonstrated in the CASP11 targets.
- Published
- 2016
46. Conserved domain and structure analysis of a putative polyphosphate kinase from Buruli ulcer causing bacterium
- Author
-
Basharat, Zarrin
- Subjects
Buruli ulcer ,Polyphosphate kinase ,Structure analysis ,Protein domain ,medicine ,Computational analysis ,Homology modeling ,Computational biology ,Biology ,Protein structure modeling ,Bioinformatics ,medicine.disease ,Simple (philosophy) - Abstract
With increasing sophistication of instruments and techniques, in addition to the increment in intricacies, girth and complexities of the problems being addressed, simple methods (especially computational biology techniques) are being overlooked, replaced or phased out. One such technique on the twilight of survival is simple computational analysis of protein sequence i.e. property determination, homology modeling etc. Manuscripts reporting solely such type of analysis face upfront rejection, although some exceptions might exist. Only some predatory or beginner journals might accept such publications. This continues despite the fact that simple, cost effective, quick computational analysis of protein sequence has its merits and paves way for further research. This report is basically an attempt to keep the dying venture of protein structure modeling alive.
- Published
- 2016
47. Critical assessment of methods of protein structure prediction: Progress and new directions in round XI
- Author
-
Moult, John, Fidelis, Krzysztof, Kryshtafovych, Andriy, Schwede, Torsten, and Tramontano, Anna
- Subjects
CASP ,Community wide experiment ,Protein structure modeling ,Biochemistry ,Structural Biology ,Molecular Biology - Published
- 2016
48. Investigation of D2 Receptor–Agonist Interactions Using a Combination of Pharmacophore and Receptor Homology Modeling
- Author
-
Peder Svensson, Kristina Luthman, Marcus Malo, and Lars Brive
- Subjects
Agonist ,Models, Molecular ,Stereochemistry ,medicine.drug_class ,Molecular Sequence Data ,Biology ,Ligands ,Biochemistry ,protein structure modeling ,LigandScout ,GPCRs ,Structure-Activity Relationship ,Dopamine receptor D2 ,Drug Discovery ,medicine ,Structure–activity relationship ,Humans ,Computer Simulation ,Homology modeling ,Amino Acid Sequence ,General Pharmacology, Toxicology and Pharmaceutics ,Binding site ,G protein-coupled receptor ,Pharmacology ,Binding Sites ,Receptors, Dopamine D2 ,Organic Chemistry ,selectivity ,Brain ,Hydrogen Bonding ,Parkinson Disease ,Full Papers ,Structural Homology, Protein ,Dopamine Agonists ,pharmacophore modeling ,Molecular Medicine ,Pharmacophore ,Sequence Alignment ,Protein Binding - Abstract
A combined modeling approach was used to identify structural factors that underlie the structure–activity relationships (SARs) of full dopamine D2 receptor agonists and structurally similar inactive compounds. A 3D structural model of the dopamine D2 receptor was constructed, with the agonist (−)-(R)-2-OH-NPA present in the binding site during the modeling procedure. The 3D model was evaluated and compared with our previously published D2 agonist pharmacophore model. The comparison revealed an inconsistency between the projected hydrogen bonding feature (Ser-TM5) in the pharmacophore model and the TM5 region in the structure model. A new refined pharmacophore model was developed, guided by the shape of the binding site in the receptor model and with less emphasis on TM5 interactions. The combination of receptor and pharmacophore modeling also identified the importance of His3936.55 for agonist binding. This convergent 3D pharmacophore and protein structure modeling strategy is considered to be general and can be highly useful in less well-characterized systems to explore ligand–receptor interactions. The strategy has the potential to identify weaknesses in the individual models and thereby provides an opportunity to improve the discriminating predictivity of both pharmacophore searches and structure-based virtual screens.
- Published
- 2012
49. Template-based protein structure modeling using TASSERVMT
- Author
-
Jeffrey Skolnick and Hongyi Zhou
- Subjects
Models, Molecular ,Protein Conformation ,Computer science ,Proteins ,Protein structure prediction ,computer.software_genre ,Biochemistry ,Article ,Caspase 9 ,Template ,Ranking ,Structural Biology ,Template based ,Data mining ,Threading (protein sequence) ,Variable number ,Protein structure modeling ,Molecular Biology ,Algorithm ,computer ,Algorithms ,Parametric statistics - Abstract
Template-based protein structure modeling is commonly used for protein structure prediction. Based on the observation that multiple template-based methods often perform better than single template-based methods, we further explore the use of a variable number of multiple templates for a given target in the latest variant of TASSER, TASSER(VMT) . We first develop an algorithm that improves the target-template alignment for a given template. The improved alignment, called the SP(3) alternative alignment, is generated by a parametric alignment method coupled with short TASSER refinement on models selected using knowledge-based scores. The refined top model is then structurally aligned to the template to produce the SP(3) alternative alignment. Templates identified using SP(3) threading are combined with the SP(3) alternative and HHEARCH alignments to provide target alignments to each template. These template models are then grouped into sets containing a variable number of template/alignment combinations. For each set, we run short TASSER simulations to build full-length models. Then, the models from all sets of templates are pooled, and the top 20-50 models selected using FTCOM ranking method. These models are then subjected to a single longer TASSER refinement run for final prediction. We benchmarked our method by comparison with our previously developed approach, pro-sp(3) -TASSER, on a set with 874 easy and 318 hard targets. The average GDT-TS score improvements for the first model are 3.5 and 4.3% for easy and hard targets, respectively. When tested on the 112 CASP9 targets, our method improves the average GDT-TS scores as compared to pro-sp3-TASSER by 8.2 and 9.3% for the 80 easy and 32 hard targets, respectively. It also shows slightly better results than the top ranked CASP9 Zhang-Server, QUARK and HHpredA methods. The program is available for download at http://cssb.biology.gatech.edu/.
- Published
- 2011
50. The VSGB 2.0 model: A next generation energy model for high resolution protein structure modeling
- Author
-
Kai Zhu, Suwen Zhao, Richard A. Friesner, Jianing Li, Robert Abel, and Yixiang Cao
- Subjects
Hydrophobic effect ,Protein structure ,Structural Biology ,Computational chemistry ,Chemistry ,Extramural ,Side chain ,Crystallographic database ,High resolution ,Statistical physics ,Protein structure modeling ,Molecular Biology ,Biochemistry - Abstract
A novel energy model (VSGB 2.0) for high resolution protein structure modeling is described, which features an optimized implicit solvent model as well as physics-based corrections for hydrogen bonding, π-π interactions, self-contact interactions and hydrophobic interactions. Parameters of the VSGB 2.0 model were fit to a crystallographic database of 2239 single side chain and 100 11–13 residue loop predictions. Combined with an advanced method of sampling and a robust algorithm for protonation state assignment, the VSGB 2.0 model was validated by predicting 115 super long loops up to 20 residues. Despite the dramatically increasing difficulty in reconstructing longer loops, a high accuracy was achieved: all of the lowest energy conformations have global backbone RMSDs better than 2.0 A from the native conformations. Average global backbone RMSDs of the predictions are 0.51, 0.63, 0.70, 0.62, 0.80, 1.41, and 1.59 A for 14, 15, 16, 17, 18, 19, and 20 residue loop predictions, respectively. When these results are corrected for possible statistical bias as explained in the text, the average global backbone RMSDs are 0.61, 0.71, 0.86, 0.62, 1.06, 1.67, and 1.59 A. Given the precision and robustness of the calculations, we believe that the VSGB 2.0 model is suitable to tackle “real” problems, such as biological function modeling and structure-based drug discovery.
- Published
- 2011
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.