34 results on '"Jones, DT"'
Search Results
2. Protein structure prediction using multiple deep neural networks in the 13th Critical Assessment of Protein Structure Prediction (CASP13).
- Author
-
Senior AW, Evans R, Jumper J, Kirkpatrick J, Sifre L, Green T, Qin C, Žídek A, Nelson AWR, Bridgland A, Penedones H, Petersen S, Simonyan K, Crossan S, Kohli P, Jones DT, Silver D, Kavukcuoglu K, and Hassabis D
- Subjects
- Algorithms, Databases, Protein, Models, Molecular, Computational Biology methods, Neural Networks, Computer, Protein Conformation, Protein Folding, Proteins chemistry
- Abstract
We describe AlphaFold, the protein structure prediction system that was entered by the group A7D in CASP13. Submissions were made by three free-modeling (FM) methods which combine the predictions of three neural networks. All three systems were guided by predictions of distances between pairs of residues produced by a neural network. Two systems assembled fragments produced by a generative neural network, one using scores from a network trained to regress GDT_TS. The third system shows that simple gradient descent on a properly constructed potential is able to perform on par with more expensive traditional search techniques and without requiring domain segmentation. In the CASP13 FM assessors' ranking by summed z-scores, this system scored highest with 68.3 vs 48.2 for the next closest group (an average GDT_TS of 61.4). The system produced high-accuracy structures (with GDT_TS scores of 70 or higher) for 11 out of 43 FM domains. Despite not explicitly using template information, the results in the template category were comparable to the best performing template-based methods., (© 2019 The Authors. Proteins: Structure, Function, and Bioinformatics published by Wiley Periodicals, Inc.)
- Published
- 2019
- Full Text
- View/download PDF
3. Prediction of interresidue contacts with DeepMetaPSICOV in CASP13.
- Author
-
Kandathil SM, Greener JG, and Jones DT
- Subjects
- Algorithms, Amino Acid Sequence genetics, Deep Learning, Machine Learning, Metagenome genetics, Neural Networks, Computer, Proteins chemistry, Proteins genetics, Sequence Analysis, Protein, Computational Biology, Protein Conformation, Proteins ultrastructure
- Abstract
In this article, we describe our efforts in contact prediction in the CASP13 experiment. We employed a new deep learning-based contact prediction tool, DeepMetaPSICOV (or DMP for short), together with new methods and data sources for alignment generation. DMP evolved from MetaPSICOV and DeepCov and combines the input feature sets used by these methods as input to a deep, fully convolutional residual neural network. We also improved our method for multiple sequence alignment generation and included metagenomic sequences in the search. We discuss successes and failures of our approach and identify areas where further improvements may be possible. DMP is freely available at: https://github.com/psipred/DeepMetaPSICOV., (© 2019 The Authors. Proteins: Structure, Function, and Bioinformatics published by Wiley Periodicals, Inc.)
- Published
- 2019
- Full Text
- View/download PDF
4. Recent developments in deep learning applied to protein structure prediction.
- Author
-
Kandathil SM, Greener JG, and Jones DT
- Subjects
- Models, Molecular, Neural Networks, Computer, Proteins chemistry, Proteins genetics, Proteins ultrastructure, Structural Homology, Protein, Computational Biology, Deep Learning, Protein Conformation
- Abstract
Although many structural bioinformatics tools have been using neural network models for a long time, deep neural network (DNN) models have attracted considerable interest in recent years. Methods employing DNNs have had a significant impact in recent CASP experiments, notably in CASP12 and especially CASP13. In this article, we offer a brief introduction to some of the key principles and properties of DNN models and discuss why they are naturally suited to certain problems in structural bioinformatics. We also briefly discuss methodological improvements that have enabled these successes. Using the contact prediction task as an example, we also speculate why DNN models are able to produce reasonably accurate predictions even in the absence of many homologues for a given target sequence, a result that can at first glance appear surprising given the lack of input information. We end on some thoughts about how and why these types of models can be so effective, as well as a discussion on potential pitfalls., (© 2019 The Authors. Proteins: Structure, Function, and Bioinformatics published by Wiley Periodicals, Inc.)
- Published
- 2019
- Full Text
- View/download PDF
5. Deep learning extends de novo protein modelling coverage of genomes using iteratively predicted structural constraints.
- Author
-
Greener JG, Kandathil SM, and Jones DT
- Subjects
- Algorithms, Animals, Binding Sites genetics, Humans, Proteome genetics, Proteome metabolism, Reproducibility of Results, Computational Biology methods, Deep Learning, Models, Molecular, Protein Conformation, Proteome chemistry, Proteomics methods
- Abstract
The inapplicability of amino acid covariation methods to small protein families has limited their use for structural annotation of whole genomes. Recently, deep learning has shown promise in allowing accurate residue-residue contact prediction even for shallow sequence alignments. Here we introduce DMPfold, which uses deep learning to predict inter-atomic distance bounds, the main chain hydrogen bond network, and torsion angles, which it uses to build models in an iterative fashion. DMPfold produces more accurate models than two popular methods for a test set of CASP12 domains, and works just as well for transmembrane proteins. Applied to all Pfam domains without known structures, confident models for 25% of these so-called dark families were produced in under a week on a small 200 core cluster. DMPfold provides models for 16% of human proteome UniProt entries without structures, generates accurate models with fewer than 100 sequences in some cases, and is freely available.
- Published
- 2019
- Full Text
- View/download PDF
6. Using deep maxout neural networks to improve the accuracy of function prediction from protein interaction networks.
- Author
-
Wan C, Cozzetto D, Fa R, and Jones DT
- Subjects
- Humans, Computational Biology, Neural Networks, Computer, Protein Interaction Mapping, Protein Interaction Maps, Proteins genetics, Proteins metabolism
- Abstract
Protein-protein interaction network data provides valuable information that infers direct links between genes and their biological roles. This information brings a fundamental hypothesis for protein function prediction that interacting proteins tend to have similar functions. With the help of recently-developed network embedding feature generation methods and deep maxout neural networks, it is possible to extract functional representations that encode direct links between protein-protein interactions information and protein function. Our novel method, STRING2GO, successfully adopts deep maxout neural networks to learn functional representations simultaneously encoding both protein-protein interactions and functional predictive information. The experimental results show that STRING2GO outperforms other protein-protein interaction network-based prediction methods and one benchmark method adopted in a recent large scale protein function prediction competition., Competing Interests: The authors have declared that no competing interests exist.
- Published
- 2019
- Full Text
- View/download PDF
7. Design of metalloproteins and novel protein folds using variational autoencoders.
- Author
-
Greener JG, Moffat L, and Jones DT
- Subjects
- Amino Acid Sequence genetics, Binding Sites, Deep Learning, Humans, Molecular Dynamics Simulation, Computational Biology, Metalloproteins chemistry, Protein Folding
- Abstract
The design of novel proteins has many applications but remains an attritional process with success in isolated cases. Meanwhile, deep learning technologies have exploded in popularity in recent years and are increasingly applicable to biology due to the rise in available data. We attempt to link protein design and deep learning by using variational autoencoders to generate protein sequences conditioned on desired properties. Potential copper and calcium binding sites are added to non-metal binding proteins without human intervention and compared to a hidden Markov model. In another use case, a grammar of protein structures is developed and used to produce sequences for a novel protein topology. One candidate structure is found to be stable by molecular dynamics simulation. The ability of our model to confine the vast search space of protein sequences and to scale easily has the potential to assist in a variety of protein design tasks.
- Published
- 2018
- Full Text
- View/download PDF
8. Improved protein contact predictions with the MetaPSICOV2 server in CASP12.
- Author
-
Buchan DWA and Jones DT
- Subjects
- Algorithms, Crystallography, X-Ray, Humans, Protein Interaction Domains and Motifs, Software, Computational Biology methods, Internet, Machine Learning, Models, Molecular, Neural Networks, Computer, Protein Conformation, Proteins chemistry
- Abstract
In this paper, we present the results for the MetaPSICOV2 contact prediction server in the CASP12 community experiment (http://predictioncenter.org). Over the 35 assessed Free Modelling target domains the MetaPSICOV2 server achieved a mean precision of 43.27%, a substantial increase relative to the server's performance in the CASP11 experiment. In the following paper, we discuss improvements to the MetaPSICOV2 server, covering both changes to the neural network and attempts to integrate contact predictions on a domain basis into the prediction pipeline. We also discuss some limitations in the CASP12 assessment which may have overestimated the performance of our method., (© 2017 The Authors Proteins: Structure, Function and Bioinformatics Published by Wiley Periodicals, Inc.)
- Published
- 2018
- Full Text
- View/download PDF
9. Analysis of temporal transcription expression profiles reveal links between protein function and developmental stages of Drosophila melanogaster.
- Author
-
Wan C, Lees JG, Minneci F, Orengo CA, and Jones DT
- Subjects
- Animals, Cluster Analysis, Computer Simulation, Drosophila Proteins analysis, Drosophila Proteins metabolism, Drosophila melanogaster genetics, Drosophila melanogaster metabolism, Models, Statistical, Phenotype, Transcriptome physiology, Computational Biology methods, Drosophila Proteins genetics, Drosophila melanogaster growth & development, Gene Expression Profiling methods, Transcriptome genetics
- Abstract
Accurate gene or protein function prediction is a key challenge in the post-genome era. Most current methods perform well on molecular function prediction, but struggle to provide useful annotations relating to biological process functions due to the limited power of sequence-based features in that functional domain. In this work, we systematically evaluate the predictive power of temporal transcription expression profiles for protein function prediction in Drosophila melanogaster. Our results show significantly better performance on predicting protein function when transcription expression profile-based features are integrated with sequence-derived features, compared with the sequence-derived features alone. We also observe that the combination of expression-based and sequence-based features leads to further improvement of accuracy on predicting all three domains of gene function. Based on the optimal feature combinations, we then propose a novel multi-classifier-based function prediction method for Drosophila melanogaster proteins, FFPred-fly+. Interpreting our machine learning models also allows us to identify some of the underlying links between biological processes and developmental stages of Drosophila melanogaster.
- Published
- 2017
- Full Text
- View/download PDF
10. EigenTHREADER: analogous protein fold recognition by efficient contact map threading.
- Author
-
Buchan DWA and Jones DT
- Subjects
- Algorithms, Computational Biology methods, Models, Molecular, Protein Folding, Sequence Analysis, Protein methods, Software
- Abstract
Motivation: Protein fold recognition when appropriate, evolutionarily-related, structural templates can be identified is often trivial and may even be viewed as a solved problem. However in cases where no homologous structural templates can be detected, fold recognition is a notoriously difficult problem ( Moult et al., 2014 ). Here we present EigenTHREADER, a novel fold recognition method capable of identifying folds where no homologous structures can be identified. EigenTHREADER takes a query amino acid sequence, generates a map of intra-residue contacts, and then searches a library of contact maps of known structures. To allow the contact maps to be compared, we use eigenvector decomposition to resolve the principal eigenvectors these can then be aligned using standard dynamic programming algorithms. The approach is similar to the Al-Eigen approach of Di Lena et al. (2010) , but with improvements made both to speed and accuracy. With this search strategy, EigenTHREADER does not depend directly on sequence homology between the target protein and entries in the fold library to generate models. This in turn enables EigenTHREADER to correctly identify analogous folds where little or no sequence homology information is., Results: EigenTHREADER outperforms well-established fold recognition methods such as pGenTHREADER and HHSearch in terms of True Positive Rate in the difficult task of analogous fold recognition. This should allow template-based modelling to be extended to many new protein families that were previously intractable to homology based fold recognition methods., Availability and Implementation: All code used to generate these results and the computational protocol can be downloaded from https://github.com/DanBuchan/eigen_scripts . EigenTHREADER, the benchmark code and the data this paper is based on can be downloaded from: http://bioinfadmin.cs.ucl.ac.uk/downloads/eigenTHREADER/ ., Contact: d.t.jones@ucl.ac.uk., (© The Author(s) 2017. Published by Oxford University Press.)
- Published
- 2017
- Full Text
- View/download PDF
11. Crohn disease risk prediction-Best practices and pitfalls with exome data.
- Author
-
Giollo M, Jones DT, Carraro M, Leonardi E, Ferrari C, and Tosatto SCE
- Subjects
- Algorithms, Genetic Predisposition to Disease, High-Throughput Nucleotide Sequencing, Humans, Practice Guidelines as Topic, Exome Sequencing, Computational Biology methods, Crohn Disease genetics
- Abstract
The Critical Assessment of Genome Interpretation (CAGI) experiment is the first attempt to evaluate the state-of-the-art in genetic data interpretation. Among the proposed challenges, Crohn disease (CD) risk prediction has become the most classic problem spanning three editions. The scientific question is very hard: can anybody assess the risk to develop CD given the exome data alone? This is one of the ultimate goals of genetic analysis, which motivated most CAGI participants to look for powerful new methods. In the 2016 CD challenge, we implemented all the best methods proposed in the past editions. This resulted in 10 algorithms, which were evaluated fairly by CAGI organizers. We also used all the data available from CAGI 11 and 13 to maximize the amount of training samples. The most effective algorithms used known genes associated with CD from the literature. No method could evaluate effectively the importance of unannotated variants by using heuristics. As a downside, all CD datasets were strongly affected by sample stratification. This affected the performance reported by assessors. Therefore, we expect that future datasets will be normalized in order to remove population effects. This will improve methods comparison and promote algorithms focused on causal variants discovery., (© 2017 Wiley Periodicals, Inc.)
- Published
- 2017
- Full Text
- View/download PDF
12. Lessons from the CAGI-4 Hopkins clinical panel challenge.
- Author
-
Chandonia JM, Adhikari A, Carraro M, Chhibber A, Cutting GR, Fu Y, Gasparini A, Jones DT, Kramer A, Kundu K, Lam HYK, Leonardi E, Moult J, Pal LR, Searls DB, Shah S, Sunyaev S, Tosatto SCE, Yin Y, and Buckley BA
- Subjects
- Databases, Genetic, Genetic Predisposition to Disease, Genetic Testing, Humans, Phenotype, Computational Biology methods, Sequence Analysis, DNA methods
- Abstract
The CAGI-4 Hopkins clinical panel challenge was an attempt to assess state-of-the-art methods for clinical phenotype prediction from DNA sequence. Participants were provided with exonic sequences of 83 genes for 106 patients from the Johns Hopkins DNA Diagnostic Laboratory. Five groups participated in the challenge, predicting both the probability that each patient had each of the 14 possible classes of disease, as well as one or more causal variants. In cases where the Hopkins laboratory reported a variant, at least one predictor correctly identified the disease class in 36 of the 43 patients (84%). Even in cases where the Hopkins laboratory did not find a variant, at least one predictor correctly identified the class in 39 of the 63 patients (62%). Each prediction group correctly diagnosed at least one patient that was not successfully diagnosed by any other group. We discuss the causal variant predictions by different groups and their implications for further development of methods to assess variants of unknown significance. Our results suggest that clinically relevant variants may be missed when physicians order small panels targeted on a specific phenotype. We also quantify the false-positive rate of DNA-guided analysis in the absence of prior phenotypic indication., (© 2017 Wiley Periodicals, Inc.)
- Published
- 2017
- Full Text
- View/download PDF
13. Predictions of Backbone Dynamics in Intrinsically Disordered Proteins Using De Novo Fragment-Based Protein Structure Predictions.
- Author
-
Kosciolek T, Buchan DWA, and Jones DT
- Subjects
- Crystallography, X-Ray, Magnetic Resonance Spectroscopy, Models, Molecular, Neural Networks, Computer, Computational Biology methods, Intrinsically Disordered Proteins chemistry, Molecular Biology methods
- Abstract
Intrinsically disordaered proteins (IDPs) are a prevalent phenomenon with over 30% of human proteins estimated to have long disordered regions. Computational methods are widely used to study IDPs, however, nearly all treat disorder in a binary fashion, not accounting for the structural heterogeneity present in disordered regions. Here, we present a new de novo method, FRAGFOLD-IDP, which addresses this problem. Using 200 protein structural ensembles derived from NMR, we show that FRAGFOLD-IDP achieves superior results compared to methods which can predict related data (NMR order parameter, or crystallographic B-factor). FRAGFOLD-IDP produces very good predictions for 33.5% of cases and helps to get a better insight into the dynamics of the disordered ensembles. The results also show it is not necessary to predict the correct fold of the protein to reliably predict per-residue fluctuations. It implies that disorder is a local property and it does not depend on the fold. Our results are orthogonal to DynaMine, the only other method significantly better than the naïve prediction. We therefore combine these two using a neural network. FRAGFOLD-IDP enables better insight into backbone dynamics in IDPs and opens exciting possibilities for the design of disordered ensembles, disorder-to-order transitions, or design for protein dynamics.
- Published
- 2017
- Full Text
- View/download PDF
14. Computational Methods for Annotation Transfers from Sequence.
- Author
-
Cozzetto D and Jones DT
- Subjects
- Animals, Databases, Protein, Humans, Phylogeny, Proteins genetics, Proteins metabolism, Computational Biology methods, Gene Ontology, Molecular Sequence Annotation methods
- Abstract
Surveys of public sequence resources show that experimentally supported functional information is still completely missing for a considerable fraction of known proteins and is clearly incomplete for an even larger portion. Bioinformatics methods have long made use of very diverse data sources alone or in combination to predict protein function, with the understanding that different data types help elucidate complementary biological roles. This chapter focuses on methods accepting amino acid sequences as input and producing GO term assignments directly as outputs; the relevant biological and computational concepts are presented along with the advantages and limitations of individual approaches.
- Published
- 2017
- Full Text
- View/download PDF
15. An expanded evaluation of protein function prediction methods shows an improvement in accuracy.
- Author
-
Jiang Y, Oron TR, Clark WT, Bankapur AR, D'Andrea D, Lepore R, Funk CS, Kahanda I, Verspoor KM, Ben-Hur A, Koo da CE, Penfold-Brown D, Shasha D, Youngs N, Bonneau R, Lin A, Sahraeian SM, Martelli PL, Profiti G, Casadio R, Cao R, Zhong Z, Cheng J, Altenhoff A, Skunca N, Dessimoz C, Dogan T, Hakala K, Kaewphan S, Mehryary F, Salakoski T, Ginter F, Fang H, Smithers B, Oates M, Gough J, Törönen P, Koskinen P, Holm L, Chen CT, Hsu WL, Bryson K, Cozzetto D, Minneci F, Jones DT, Chapman S, Bkc D, Khan IK, Kihara D, Ofer D, Rappoport N, Stern A, Cibrian-Uhalte E, Denny P, Foulger RE, Hieta R, Legge D, Lovering RC, Magrane M, Melidoni AN, Mutowo-Meullenet P, Pichler K, Shypitsyna A, Li B, Zakeri P, ElShal S, Tranchevent LC, Das S, Dawson NL, Lee D, Lees JG, Sillitoe I, Bhat P, Nepusz T, Romero AE, Sasidharan R, Yang H, Paccanaro A, Gillis J, Sedeño-Cortés AE, Pavlidis P, Feng S, Cejuela JM, Goldberg T, Hamp T, Richter L, Salamov A, Gabaldon T, Marcet-Houben M, Supek F, Gong Q, Ning W, Zhou Y, Tian W, Falda M, Fontana P, Lavezzo E, Toppo S, Ferrari C, Giollo M, Piovesan D, Tosatto SC, Del Pozo A, Fernández JM, Maietta P, Valencia A, Tress ML, Benso A, Di Carlo S, Politano G, Savino A, Rehman HU, Re M, Mesiti M, Valentini G, Bargsten JW, van Dijk AD, Gemovic B, Glisic S, Perovic V, Veljkovic V, Veljkovic N, Almeida-E-Silva DC, Vencio RZ, Sharan M, Vogel J, Kansakar L, Zhang S, Vucetic S, Wang Z, Sternberg MJ, Wass MN, Huntley RP, Martin MJ, O'Donovan C, Robinson PN, Moreau Y, Tramontano A, Babbitt PC, Brenner SE, Linial M, Orengo CA, Rost B, Greene CS, Mooney SD, Friedberg I, and Radivojac P
- Subjects
- Algorithms, Databases, Protein, Gene Ontology, Humans, Molecular Sequence Annotation, Proteins genetics, Computational Biology, Proteins chemistry, Software, Structure-Activity Relationship
- Abstract
Background: A major bottleneck in our understanding of the molecular underpinnings of life is the assignment of function to proteins. While molecular experiments provide the most reliable annotation of proteins, their relatively low throughput and restricted purview have led to an increasing role for computational function prediction. However, assessing methods for protein function prediction and tracking progress in the field remain challenging., Results: We conducted the second critical assessment of functional annotation (CAFA), a timed challenge to assess computational methods that automatically assign protein function. We evaluated 126 methods from 56 research groups for their ability to predict biological functions using Gene Ontology and gene-disease associations using Human Phenotype Ontology on a set of 3681 proteins from 18 species. CAFA2 featured expanded analysis compared with CAFA1, with regards to data set size, variety, and assessment metrics. To review progress in the field, the analysis compared the best methods from CAFA1 to those of CAFA2., Conclusions: The top-performing methods in CAFA2 outperformed those from CAFA1. This increased accuracy can be attributed to a combination of the growing number of experimental annotations and improved methods for function prediction. The assessment also revealed that the definition of top-performing algorithms is ontology specific, that different performance metrics can be used to probe the nature of accurate predictions, and the relative diversity of predictions in the biological process and human phenotype ontologies. While there was methodological improvement between CAFA1 and CAFA2, the interpretation of results and usefulness of individual methods remain context-dependent.
- Published
- 2016
- Full Text
- View/download PDF
16. Accurate contact predictions using covariation techniques and machine learning.
- Author
-
Kosciolek T and Jones DT
- Subjects
- Amino Acid Sequence, Bacteria chemistry, Computational Biology methods, Computer Simulation, Databases, Protein, Humans, Internet, Neural Networks, Computer, Protein Folding, Protein Interaction Domains and Motifs, Protein Structure, Secondary, Sequence Alignment, Viruses chemistry, Computational Biology statistics & numerical data, Machine Learning, Models, Molecular, Models, Statistical, Proteins chemistry, Software
- Abstract
Here we present the results of residue-residue contact predictions achieved in CASP11 by the CONSIP2 server, which is based around our MetaPSICOV contact prediction method. On a set of 40 target domains with a median family size of around 40 effective sequences, our server achieved an average top-L/5 long-range contact precision of 27%. MetaPSICOV method bases on a combination of classical contact prediction features, enhanced with three distinct covariation methods embedded in a two-stage neural network predictor. Some unique features of our approach are (1) the tuning between the classical and covariation features depending on the depth of the input alignment and (2) a hybrid approach to generate deepest possible multiple-sequence alignments by combining jackHMMer and HHblits. We discuss the CONSIP2 pipeline, our results and show that where the method underperformed, the major factor was relying on a fixed set of parameters for the initial sequence alignments and not attempting to perform domain splitting as a preprocessing step. Proteins 2016; 84(Suppl 1):145-151. © 2015 The Authors. Proteins: Structure, Function, and Bioinformatics Published by Wiley Periodicals, Inc., (© 2015 The Authors. Proteins: Structure, Function, and Bioinformatics Published by Wiley Periodicals, Inc.)
- Published
- 2016
- Full Text
- View/download PDF
17. FFPred 3: feature-based function prediction for all Gene Ontology domains.
- Author
-
Cozzetto D, Minneci F, Currant H, and Jones DT
- Subjects
- Humans, Computational Biology methods, Gene Ontology
- Abstract
Predicting protein function has been a major goal of bioinformatics for several decades, and it has gained fresh momentum thanks to recent community-wide blind tests aimed at benchmarking available tools on a genomic scale. Sequence-based predictors, especially those performing homology-based transfers, remain the most popular but increasing understanding of their limitations has stimulated the development of complementary approaches, which mostly exploit machine learning. Here we present FFPred 3, which is intended for assigning Gene Ontology terms to human protein chains, when homology with characterized proteins can provide little aid. Predictions are made by scanning the input sequences against an array of Support Vector Machines (SVMs), each examining the relationship between protein function and biophysical attributes describing secondary structure, transmembrane helices, intrinsically disordered regions, signal peptides and other motifs. This update features a larger SVM library that extends its coverage to the cellular component sub-ontology for the first time, prompted by the establishment of a dedicated evaluation category within the Critical Assessment of Functional Annotation. The effectiveness of this approach is demonstrated through benchmarking experiments, and its usefulness is illustrated by analysing the potential functional consequences of alternative splicing in human and their relationship to patterns of biological features.
- Published
- 2016
- Full Text
- View/download PDF
18. MetaPSICOV: combining coevolution methods for accurate prediction of contacts and long range hydrogen bonding in proteins.
- Author
-
Jones DT, Singh T, Kosciolek T, and Tetchner S
- Subjects
- Databases, Protein, Humans, Hydrogen Bonding, Protein Folding, Algorithms, Computational Biology methods, Proteins chemistry, Sequence Alignment methods, Sequence Analysis, Protein methods, Software
- Abstract
Motivation: Recent developments of statistical techniques to infer direct evolutionary couplings between residue pairs have rendered covariation-based contact prediction a viable means for accurate 3D modelling of proteins, with no information other than the sequence required. To extend the usefulness of contact prediction, we have designed a new meta-predictor (MetaPSICOV) which combines three distinct approaches for inferring covariation signals from multiple sequence alignments, considers a broad range of other sequence-derived features and, uniquely, a range of metrics which describe both the local and global quality of the input multiple sequence alignment. Finally, we use a two-stage predictor, where the second stage filters the output of the first stage. This two-stage predictor is additionally evaluated on its ability to accurately predict the long range network of hydrogen bonds, including correctly assigning the donor and acceptor residues., Results: Using the original PSICOV benchmark set of 150 protein families, MetaPSICOV achieves a mean precision of 0.54 for top-L predicted long range contacts-around 60% higher than PSICOV, and around 40% better than CCMpred. In de novo protein structure prediction using FRAGFOLD, MetaPSICOV is able to improve the TM-scores of models by a median of 0.05 compared with PSICOV. Lastly, for predicting long range hydrogen bonding, MetaPSICOV-HB achieves a precision of 0.69 for the top-L/10 hydrogen bonds compared with just 0.26 for the baseline MetaPSICOV., Availability and Implementation: MetaPSICOV is available as a freely available web server at http://bioinf.cs.ucl.ac.uk/MetaPSICOV. Raw data (predicted contact lists and 3D models) and source code can be downloaded from http://bioinf.cs.ucl.ac.uk/downloads/MetaPSICOV., Supplementary Information: Supplementary data are available at Bioinformatics online., (© The Author 2014. Published by Oxford University Press.)
- Published
- 2015
- Full Text
- View/download PDF
19. DISOPRED3: precise disordered region predictions with annotated protein-binding activity.
- Author
-
Jones DT and Cozzetto D
- Subjects
- Binding Sites, Databases, Factual, Neural Networks, Computer, Protein Binding, Protein Interaction Domains and Motifs, Computational Biology, Intrinsically Disordered Proteins chemistry, Intrinsically Disordered Proteins metabolism, Molecular Sequence Annotation, Software
- Abstract
Motivation: A sizeable fraction of eukaryotic proteins contain intrinsically disordered regions (IDRs), which act in unfolded states or by undergoing transitions between structured and unstructured conformations. Over time, sequence-based classifiers of IDRs have become fairly accurate and currently a major challenge is linking IDRs to their biological roles from the molecular to the systems level., Results: We describe DISOPRED3, which extends its predecessor with new modules to predict IDRs and protein-binding sites within them. Based on recent CASP evaluation results, DISOPRED3 can be regarded as state of the art in the identification of IDRs, and our self-assessment shows that it significantly improves over DISOPRED2 because its predictions are more specific across the whole board and more sensitive to IDRs longer than 20 amino acids. Predicted IDRs are annotated as protein binding through a novel SVM based classifier, which uses profile data and additional sequence-derived features. Based on benchmarking experiments with full cross-validation, we show that this predictor generates precise assignments of disordered protein binding regions and that it compares well with other publicly available tools., (© The Author 2014. Published by Oxford University Press.)
- Published
- 2015
- Full Text
- View/download PDF
20. De novo structure prediction of globular proteins aided by sequence variation-derived contacts.
- Author
-
Kosciolek T and Jones DT
- Subjects
- Algorithms, Amino Acid Sequence, Databases, Protein, Models, Molecular, Protein Folding, Thermodynamics, Amino Acids chemistry, Computational Biology methods, Proteins chemistry, Sequence Analysis, Protein
- Abstract
The advent of high accuracy residue-residue intra-protein contact prediction methods enabled a significant boost in the quality of de novo structure predictions. Here, we investigate the potential benefits of combining a well-established fragment-based folding algorithm--FRAGFOLD, with PSICOV, a contact prediction method which uses sparse inverse covariance estimation to identify co-varying sites in multiple sequence alignments. Using a comprehensive set of 150 diverse globular target proteins, up to 266 amino acids in length, we are able to address the effectiveness and some limitations of such approaches to globular proteins in practice. Overall we find that using fragment assembly with both statistical potentials and predicted contacts is significantly better than either statistical potentials or contacts alone. Results show up to nearly 80% of correct predictions (TM-score ≥0.5) within analysed dataset and a mean TM-score of 0.54. Unsuccessful modelling cases emerged either from conformational sampling problems, or insufficient contact prediction accuracy. Nevertheless, a strong dependency of the quality of final models on the fraction of satisfied predicted long-range contacts was observed. This not only highlights the importance of these contacts on determining the protein fold, but also (combined with other ensemble-derived qualities) provides a powerful guide as to the choice of correct models and the global quality of the selected model. A proposed quality assessment scoring function achieves 0.93 precision and 0.77 recall for the discrimination of correct folds on our dataset of decoys. These findings suggest the approach is well-suited for blind predictions on a variety of globular proteins of unknown 3D structure, provided that enough homologous sequences are available to construct a large and accurate multiple sequence alignment for the initial contact prediction step.
- Published
- 2014
- Full Text
- View/download PDF
21. Evaluation of predictions in the CASP10 model refinement category.
- Author
-
Nugent T, Cozzetto D, and Jones DT
- Subjects
- Algorithms, Models, Statistical, Sequence Alignment, Computational Biology methods, Computational Biology standards, Models, Molecular, Protein Conformation, Proteins chemistry
- Abstract
Here we report on the assessment results of the third experiment to evaluate the state of the art in protein model refinement, where participants were invited to improve the accuracy of initial protein models for 27 targets. Using an array of complementary evaluation measures, we find that five groups performed better than the naïve (null) method-a marked improvement over CASP9, although only three were significantly better. The leading groups also demonstrated the ability to consistently improve both backbone and side chain positioning, while other groups reliably enhanced other aspects of protein physicality. The top-ranked group succeeded in improving the backbone conformation in almost 90% of targets, suggesting a strategy that for the first time in CASP refinement is successful in a clear majority of cases. A number of issues remain unsolved: the majority of groups still fail to improve the quality of the starting models; even successful groups are only able to make modest improvements; and no prediction is more similar to the native structure than to the starting model. Successful refinement attempts also often go unrecognized, as suggested by the relatively larger improvements when predictions not submitted as model 1 are also considered., (Copyright © The Authors Proteins: Structure, Function, and Bioinformatics Published by Wiley Periodicals, Inc.)
- Published
- 2014
- Full Text
- View/download PDF
22. Membrane protein orientation and refinement using a knowledge-based statistical potential.
- Author
-
Nugent T and Jones DT
- Subjects
- Algorithms, Knowledge Bases, Lipid Bilayers chemistry, Models, Molecular, Reproducibility of Results, Computational Biology methods, Membrane Potentials physiology, Membrane Proteins chemistry
- Abstract
Background: Recent increases in the number of deposited membrane protein crystal structures necessitate the use of automated computational tools to position them within the lipid bilayer. Identifying the correct orientation allows us to study the complex relationship between sequence, structure and the lipid environment, which is otherwise challenging to investigate using experimental techniques due to the difficulty in crystallising membrane proteins embedded within intact membranes., Results: We have developed a knowledge-based membrane potential, calculated by the statistical analysis of transmembrane protein structures, coupled with a combination of genetic and direct search algorithms, and demonstrate its use in positioning proteins in membranes, refinement of membrane protein models and in decoy discrimination., Conclusions: Our method is able to quickly and accurately orientate both alpha-helical and beta-barrel membrane proteins within the lipid bilayer, showing closer agreement with experimentally determined values than existing approaches. We also demonstrate both consistent and significant refinement of membrane protein models and the effective discrimination between native and decoy structures. Source code is available under an open source license from http://bioinf.cs.ucl.ac.uk/downloads/memembed/.
- Published
- 2013
- Full Text
- View/download PDF
23. A large-scale evaluation of computational protein function prediction.
- Author
-
Radivojac P, Clark WT, Oron TR, Schnoes AM, Wittkop T, Sokolov A, Graim K, Funk C, Verspoor K, Ben-Hur A, Pandey G, Yunes JM, Talwalkar AS, Repo S, Souza ML, Piovesan D, Casadio R, Wang Z, Cheng J, Fang H, Gough J, Koskinen P, Törönen P, Nokso-Koivisto J, Holm L, Cozzetto D, Buchan DW, Bryson K, Jones DT, Limaye B, Inamdar H, Datta A, Manjari SK, Joshi R, Chitale M, Kihara D, Lisewski AM, Erdin S, Venner E, Lichtarge O, Rentzsch R, Yang H, Romero AE, Bhat P, Paccanaro A, Hamp T, Kaßner R, Seemayer S, Vicedo E, Schaefer C, Achten D, Auer F, Boehm A, Braun T, Hecht M, Heron M, Hönigschmid P, Hopf TA, Kaufmann S, Kiening M, Krompass D, Landerer C, Mahlich Y, Roos M, Björne J, Salakoski T, Wong A, Shatkay H, Gatzmann F, Sommer I, Wass MN, Sternberg MJ, Škunca N, Supek F, Bošnjak M, Panov P, Džeroski S, Šmuc T, Kourmpetis YA, van Dijk AD, ter Braak CJ, Zhou Y, Gong Q, Dong X, Tian W, Falda M, Fontana P, Lavezzo E, Di Camillo B, Toppo S, Lan L, Djuric N, Guo Y, Vucetic S, Bairoch A, Linial M, Babbitt PC, Brenner SE, Orengo C, Rost B, Mooney SD, and Friedberg I
- Subjects
- Algorithms, Animals, Databases, Protein, Exoribonucleases classification, Exoribonucleases genetics, Exoribonucleases physiology, Forecasting, Humans, Proteins chemistry, Proteins classification, Proteins genetics, Species Specificity, Computational Biology methods, Molecular Biology methods, Molecular Sequence Annotation, Proteins physiology
- Abstract
Automated annotation of protein function is challenging. As the number of sequenced genomes rapidly grows, the overwhelming majority of protein products can only be annotated computationally. If computational predictions are to be relied upon, it is crucial that the accuracy of these methods be high. Here we report the results from the first large-scale community-based critical assessment of protein function annotation (CAFA) experiment. Fifty-four methods representing the state of the art for protein function prediction were evaluated on a target set of 866 proteins from 11 organisms. Two findings stand out: (i) today's best protein function prediction algorithms substantially outperform widely used first-generation methods, with large gains on all types of targets; and (ii) although the top methods perform well enough to guide experiments, there is considerable need for improvement of currently available tools.
- Published
- 2013
- Full Text
- View/download PDF
24. Protein topology from predicted residue contacts.
- Author
-
Taylor WR, Jones DT, and Sadowski MI
- Subjects
- Algorithms, Amino Acid Sequence, Models, Biological, Models, Molecular, Molecular Dynamics Simulation, Molecular Sequence Data, Protein Binding genetics, Protein Binding physiology, Protein Folding, Protein Interaction Domains and Motifs genetics, Protein Structure, Secondary, Proteins chemistry, Sequence Alignment, Sequence Homology, Amino Acid, Computational Biology methods, Models, Statistical, Protein Conformation, Protein Interaction Domains and Motifs physiology
- Abstract
Residue contacts predicted from correlated positions in a multiple sequence alignment are often sparse and uncertain. To some extent, these limitations in the data can be overcome by grouping the contacts by secondary structure elements and enumerating the possible packing arrangements of these elements in a combinatorial manner. Strong interactions appear frequently but inconsistent interactions are down-weighted and missing interactions up-weighted. The resulting improved consistency in the predicted interactions has allowed the method to be successfully applied to proteins up to 200 residues in length which is larger than any structure previously predicted using sequence data alone., (Copyright © 2011 The Protein Society.)
- Published
- 2012
- Full Text
- View/download PDF
25. Prediction of hot spot residues at protein-protein interfaces by combining machine learning and energy-based methods.
- Author
-
Lise S, Archambeau C, Pontil M, and Jones DT
- Subjects
- Alanine chemistry, Artificial Intelligence, Binding Sites, Databases, Protein, Hydrogen Bonding, Proteins metabolism, Static Electricity, Thermodynamics, Computational Biology methods, Protein Interaction Mapping methods, Proteins chemistry
- Abstract
Background: Alanine scanning mutagenesis is a powerful experimental methodology for investigating the structural and energetic characteristics of protein complexes. Individual amino-acids are systematically mutated to alanine and changes in free energy of binding (DeltaDeltaG) measured. Several experiments have shown that protein-protein interactions are critically dependent on just a few residues ("hot spots") at the interface. Hot spots make a dominant contribution to the free energy of binding and if mutated they can disrupt the interaction. As mutagenesis studies require significant experimental efforts, there is a need for accurate and reliable computational methods. Such methods would also add to our understanding of the determinants of affinity and specificity in protein-protein recognition., Results: We present a novel computational strategy to identify hot spot residues, given the structure of a complex. We consider the basic energetic terms that contribute to hot spot interactions, i.e. van der Waals potentials, solvation energy, hydrogen bonds and Coulomb electrostatics. We treat them as input features and use machine learning algorithms such as Support Vector Machines and Gaussian Processes to optimally combine and integrate them, based on a set of training examples of alanine mutations. We show that our approach is effective in predicting hot spots and it compares favourably to other available methods. In particular we find the best performances using Transductive Support Vector Machines, a semi-supervised learning scheme. When hot spots are defined as those residues for which DeltaDeltaG >or= 2 kcal/mol, our method achieves a precision and a recall respectively of 56% and 65%., Conclusion: We have developed an hybrid scheme in which energy terms are used as input features of machine learning models. This strategy combines the strengths of machine learning and energy-based methods. Although so far these two types of approaches have mainly been applied separately to biomolecular problems, the results of our investigation indicate that there are substantial benefits to be gained by their integration.
- Published
- 2009
- Full Text
- View/download PDF
26. pGenTHREADER and pDomTHREADER: new methods for improved protein fold recognition and superfamily discrimination.
- Author
-
Lobley A, Sadowski MI, and Jones DT
- Subjects
- Protein Folding, Proteins chemistry, Sequence Analysis, Protein methods, Computational Biology methods, Protein Structure, Tertiary, Proteins classification, Software
- Abstract
Motivation: Generation of structural models and recognition of homologous relationships for unannotated protein sequences are fundamental problems in bioinformatics. Improving the sensitivity and selectivity of methods designed for these two tasks therefore has downstream benefits for many other bioinformatics applications., Results: We describe the latest implementation of the GenTHREADER method for structure prediction on a genomic scale. The method combines profile-profile alignments with secondary-structure specific gap-penalties, classic pair- and solvation potentials using a linear combination optimized with a regression SVM model. We find this combination significantly improves both detection of useful templates and accuracy of sequence-structure alignments relative to other competitive approaches. We further present a second implementation of the protocol designed for the task of discriminating superfamilies from one another. This method, pDomTHREADER, is the first to incorporate both sequence and structural data directly in this task and improves sensitivity and selectivity over the standard version of pGenTHREADER and three other standard methods for remote homology detection.
- Published
- 2009
- Full Text
- View/download PDF
27. Introduction. Bioinformatics: from molecules to systems.
- Author
-
Jones DT, Sternberg MJ, and Thornton JM
- Subjects
- Proteins chemistry, Software, Systems Biology, Computational Biology
- Published
- 2006
- Full Text
- View/download PDF
28. Prediction of novel and analogous folds using fragment assembly and fold recognition.
- Author
-
Jones DT, Bryson K, Coleman A, McGuffin LJ, Sadowski MI, Sodhi JS, and Ward JJ
- Subjects
- Algorithms, Computer Simulation, Computers, Databases, Protein, Dimerization, Humans, Models, Molecular, Protein Conformation, Protein Folding, Protein Structure, Secondary, Protein Structure, Tertiary, Reproducibility of Results, Sequence Alignment, Software, Computational Biology methods, Proteomics methods
- Abstract
A number of new and newly improved methods for predicting protein structure developed by the Jones-University College London group were used to make predictions for the CASP6 experiment. Structures were predicted with a combination of fold recognition methods (mGenTHREADER, nFOLD, and THREADER) and a substantially enhanced version of FRAGFOLD, our fragment assembly method. Attempts at automatic domain parsing were made using DomPred and DomSSEA, which are based on a secondary structure parsing algorithm and additionally for DomPred, a simple local sequence alignment scoring function. Disorder prediction was carried out using a new SVM-based version of DISOPRED. Attempts were also made at domain docking and "microdomain" folding in order to build complete chain models for some targets., (2005 Wiley-Liss, Inc.)
- Published
- 2005
- Full Text
- View/download PDF
29. Prediction of disordered regions in proteins from position specific score matrices.
- Author
-
Jones DT and Ward JJ
- Subjects
- Magnetic Resonance Spectroscopy, Models, Molecular, Reproducibility of Results, Computational Biology methods, Neural Networks, Computer, Protein Conformation, Proteins chemistry
- Abstract
We describe here the results of using a neural network based method (DISOPRED) for predicting disordered regions in 55 proteins in the 5(th) CASP experiment. A set of 715 highly resolved proteins with regions of disorder was used to train the network. The inputs to the network were derived from sequence profiles generated by PSI-BLAST. A post-filter was applied to the output of the network to prevent regions being predicted as disordered in regions of confidently predicted alpha helix or beta sheet structure. The overall two-state prediction accuracy for the method is very high (90%) but this is highly skewed by the fact that most residues are observed to be ordered. The overall Matthews' correlation coefficient for the submitted predictions is 0.34, which gives a more realistic impression of the overall accuracy of the method, though still indicates significant predictive power., (Copyright 2003 Wiley-Liss, Inc.)
- Published
- 2003
- Full Text
- View/download PDF
30. Assembling novel protein folds from super-secondary structural fragments.
- Author
-
Jones DT and McGuffin LJ
- Subjects
- Algorithms, Models, Molecular, Protein Structure, Tertiary, Thermodynamics, Computational Biology methods, Protein Folding, Protein Structure, Secondary, Proteins chemistry
- Abstract
The results of applying a fragment-based protein tertiary structure prediction method to the prediction of 14 CASP5 target domains are described. The method is based on the assembly of supersecondary structural fragments taken from highly resolved protein structures using a simulated annealing algorithm. A number of good predictions for proteins with novel folds were produced, although not always as the first model. For two fold recognition targets, FRAGFOLD produced the most accurate model in both cases, despite the fact that the predictions were not based on a template structure. Although clear progress has been made in improving FRAGFOLD since CASP4, the ranking of final models still seems to be the main problem that needs to be addressed before the next CASP experiment., (Copyright 2003 Wiley-Liss, Inc.)
- Published
- 2003
- Full Text
- View/download PDF
31. Rapid protein domain assignment from amino acid sequence using predicted secondary structure.
- Author
-
Marsden RL, McGuffin LJ, and Jones DT
- Subjects
- Amino Acid Sequence, Protein Structure, Secondary, Protein Structure, Tertiary, Sensitivity and Specificity, Sequence Alignment, Computational Biology methods, Proteins chemistry
- Abstract
The elucidation of the domain content of a given protein sequence in the absence of determined structure or significant sequence homology to known domains is an important problem in structural biology. Here we address how successfully the delineation of continuous domains can be accomplished in the absence of sequence homology using simple baseline methods, an existing prediction algorithm (Domain Guess by Size), and a newly developed method (DomSSEA). The study was undertaken with a view to measuring the usefulness of these prediction methods in terms of their application to fully automatic domain assignment. Thus, the sensitivity of each domain assignment method was measured by calculating the number of correctly assigned top scoring predictions. We have implemented a new continuous domain identification method using the alignment of predicted secondary structures of target sequences against observed secondary structures of chains with known domain boundaries as assigned by Class Architecture Topology Homology (CATH). Taking top predictions only, the success rate of the method in correctly assigning domain number to the representative chain set is 73.3%. The top prediction for domain number and location of domain boundaries was correct for 24% of the multidomain set (+/-20 residues). These results have been put into context in relation to the results obtained from the other prediction methods assessed.
- Published
- 2002
- Full Text
- View/download PDF
32. Protein structure prediction in genomics.
- Author
-
Jones DT
- Subjects
- Animals, Genome, Human, Humans, Internet, Models, Molecular, Protein Folding, Proteins physiology, Computational Biology, Genomics, Proteins chemistry, Proteins genetics
- Abstract
As the number of completely sequenced genomes rapidly increases, including now the complete Human Genome sequence, the post-genomic problems of genome-scale protein structure determination and the issue of gene function identification become ever more pressing. In fact, these problems can be seen as interrelated in that experimentally determining or predicting or the structure of proteins encoded by genes of interest is one possible means to glean subtle hints as to the functions of these genes. The applicability of this approach to gene characterisation is reviewed, along with a brief survey of the reliability of large-scale protein structure prediction methods and the prospects for the development of new prediction methods.
- Published
- 2001
- Full Text
- View/download PDF
33. What are the baselines for protein fold recognition?
- Author
-
McGuffin LJ, Bryson K, and Jones DT
- Subjects
- Databases, Factual, Protein Structure, Secondary, Protein Structure, Tertiary, Reproducibility of Results, Sensitivity and Specificity, Computational Biology, Protein Folding
- Abstract
Motivation: What constitutes a baseline level of success for protein fold recognition methods? As fold recognition benchmarks are often presented without any thought to the results that might be expected from a purely random set of predictions, an analysis of fold recognition baselines is long overdue. Given varying amounts of basic information about a protein-ranging from the length of the sequence to a knowledge of its secondary structure-to what extent can the fold be determined by intelligent guesswork? Can simple methods that make use of secondary structure information assign folds more accurately than purely random methods and could these methods be used to construct viable hierarchical classifications? EXPERIMENTS PERFORMED: A number of rapid automatic methods which score similarities between protein domains were devised and tested. These methods ranged from those that incorporated no secondary structure information, such as measuring absolute differences in sequence lengths, to more complex alignments of secondary structure elements. Each method was assessed for accuracy by comparison with the Class Architecture Topology Homology (CATH) classification. Methods were rated against both a random baseline fold assignment method as a lower control and FSSP as an upper control. Similarity trees were constructed in order to evaluate the accuracy of optimum methods at producing a classification of structure., Results: Using a rigorous comparison of methods with CATH, the random fold assignment method set a lower baseline of 11% true positives allowing for 3% false positives and FSSP set an upper benchmark of 47% true positives at 3% false positives. The optimum secondary structure alignment method used here achieved 27% true positives at 3% false positives. Using a less rigorous Critical Assessment of Structure Prediction (CASP)-like sensitivity measurement the random assignment achieved 6%, FSSP-59% and the optimum secondary structure alignment method-32%. Similarity trees produced by the optimum method illustrate that these methods cannot be used alone to produce a viable protein structural classification system., Conclusions: Simple methods that use perfect secondary structure information to assign folds cannot produce an accurate protein taxonomy, however they do provide useful baselines for fold recognition. In terms of a typical CASP assessment our results suggest that approximately 6% of targets with folds in the databases could be assigned correctly by randomly guessing, and as many as 32% could be recognised by trivial secondary structure comparison methods, given knowledge of their correct secondary structures.
- Published
- 2001
- Full Text
- View/download PDF
34. A large-scale evaluation of computational protein function prediction
- Author
-
Christine A. Orengo, Liang Lan, Daniel W. A. Buchan, Jeffrey M. Yunes, Alberto Paccanaro, Yannick Mahlich, Enrico Lavezzo, Patricia C. Babbitt, Domenico Cozzetto, Cedric Landerer, Jari Björne, Esmeralda Vicedo, Robert Rentzsch, Rajendra Joshi, Hagit Shatkay, Nives Škunca, Zheng Wang, Tal Ronnen Oron, Ingolf Sommer, Amos Marc Bairoch, Mark Heron, Panče Panov, Daisuke Kihara, Wyatt T. Clark, Michael J.E. Sternberg, Steven E. Brenner, Sašo Džeroski, Burkhard Rost, Christian Schaefer, Karin Verspoor, Harshal Inamdar, Tapio Salakoski, Meghana Chitale, Alfonso E. Romero, Julian Gough, Fran Supek, Olivier Lichtarge, Dominik Achten, Serkan Erdin, Michael Kiening, Petri Törönen, Avik Datta, Iddo Friedberg, Thomas A. Hopf, Liisa Holm, Rita Casadio, Asa Ben-Hur, Tatjana Braun, Sean D. Mooney, Marco Falda, Kiley Graim, Michal Linial, Alexandra M. Schnoes, Christopher S. Funk, Rebecca Kaßner, Patrik Koskinen, Nemanja Djuric, Paolo Fontana, Predrag Radivojac, Tobias Wittkop, Kevin Bryson, Maximilian Hecht, Susanna Repo, Haixuan Yang, Artem Sokolov, Prajwal Bhat, Tobias Hamp, Jianlin Cheng, Mark N. Wass, Gaurav Pandey, Michael L Souza, Damiano Piovesan, Ameet Talwalkar, Stefan Seemayer, Eric Venner, Sunitha K Manjari, Fanny Gatzmann, Aalt D. J. van Dijk, Manfred Roos, Tomislav Šmuc, David T. Jones, Peter Hönigschmid, Ariane Boehm, Florian Auer, Jussi Nokso-Koivisto, Stefano Toppo, Slobodan Vucetic, Denis Krompass, Qingtian Gong, Cajo J. F. ter Braak, Andrew Wong, Barbara Di Camillo, Yiannis A. I. Kourmpetis, Andreas Martin Lisewski, Matko Bošnjak, Bhakti Limaye, Weidong Tian, Yuhong Guo, Xinran Dong, Hai Fang, Yuanpeng Zhou, Stefanie Kaufmann, Radivojac P, Clark WT, Oron TR, Schnoes AM, Wittkop T, Sokolov A, Graim K, Funk C, Verspoor K, Ben-Hur A, Pandey G, Yunes JM, Talwalkar AS, Repo S, Souza ML, Piovesan D, Casadio R, Wang Z, Cheng J, Fang H, Gough J, Koskinen P, Törönen P, Nokso-Koivisto J, Holm L, Cozzetto D, Buchan DW, Bryson K, Jones DT, Limaye B, Inamdar H, Datta A, Manjari SK, Joshi R, Chitale M, Kihara D, Lisewski AM, Erdin S, Venner E, Lichtarge O, Rentzsch R, Yang H, Romero AE, Bhat P, Paccanaro A, Hamp T, Kaßner R, Seemayer S, Vicedo E, Schaefer C, Achten D, Auer F, Boehm A, Braun T, Hecht M, Heron M, Hönigschmid P, Hopf TA, Kaufmann S, Kiening M, Krompass D, Landerer C, Mahlich Y, Roos M, Björne J, Salakoski T, Wong A, Shatkay H, Gatzmann F, Sommer I, Wass MN, Sternberg MJ, Škunca N, Supek F, Bošnjak M, Panov P, Džeroski S, Šmuc T, Kourmpetis YA, van Dijk AD, ter Braak CJ, Zhou Y, Gong Q, Dong X, Tian W, Falda M, Fontana P, Lavezzo E, Di Camillo B, Toppo S, Lan L, Djuric N, Guo Y, Vucetic S, Bairoch A, Linial M, Babbitt PC, Brenner SE, Orengo C, Rost B, Mooney SD, Friedberg I, Biotechnology and Biological Sciences Research Council (BBSRC), Wang, Zheng, and Bairoch, Amos Marc
- Subjects
Bioinformatics ,computer.software_genre ,Wiskundige en Statistische Methoden - Biometris ,Biochemistry ,ANNOTATION ,0302 clinical medicine ,10 Technology ,Proteins/chemistry/classification/genetics/physiology ,protein function ,computational annotation ,CAFA experiment ,rna ,Protein function prediction ,NETWORK ,Databases, Protein ,database ,0303 health sciences ,Sequence ,Protein function ,Settore BIO/11 - BIOLOGIA MOLECOLARE ,GENE ONTOLOGY ,11 Medical And Health Sciences ,Biometris ,Molecular Sequence Annotation ,annotation ,Life Sciences & Biomedicine ,Algorithms ,Biotechnology ,Biochemistry & Molecular Biology ,DATABASE ,GENOMES ,Biology ,Machine learning ,SEQUENCE ,Biochemical Research Methods ,Article ,Set (abstract data type) ,BIOS Applied Bioinformatics ,03 medical and health sciences ,Annotation ,Species Specificity ,Animals ,Humans ,GOLD ,ddc:576 ,Critical Assessment of Function Annotation ,Mathematical and Statistical Methods - Biometris ,Molecular Biology ,030304 developmental biology ,Science & Technology ,business.industry ,Scale (chemistry) ,ta1182 ,Computational Biology ,Proteins ,Cell Biology ,Computational Biology/methods ,gold ,sequence ,06 Biological Sciences ,Exoribonucleases/classification/genetics/physiology ,network ,Exoribonucleases ,Molecular Biology/methods ,gene ontology ,RNA ,Artificial intelligence ,ddc:004 ,genomes ,business ,computer ,030217 neurology & neurosurgery ,Developmental Biology ,Forecasting - Abstract
Automated annotation of protein function is challenging. As the number of sequenced genomes rapidly grows, the overwhelming majority of protein products can only be annotated computationally. If computational predictions are to be relied upon, it is crucial that the accuracy of these methods be high. Here we report the results from the first large-scale community-based Critical Assessment of protein Function Annotation (CAFA) experiment. Fifty-four methods representing the state-of-the-art for protein function prediction were evaluated on a target set of 866 proteins from eleven organisms. Two findings stand out: (i) today’s best protein function prediction algorithms significantly outperformed widely-used first-generation methods, with large gains on all types of targets; and (ii) although the top methods perform well enough to guide experiments, there is significant need for improvement of currently available tools.
- Published
- 2013
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.