139 results on '"Structural Bioinformatics"'
Search Results
2. An in-silico analysis of OGT gene association with diabetes mellitus.
- Author
-
Ayodele, Abigail O., Udosen, Brenda, Oluwagbemi, Olugbenga O., Oladipo, Elijah K., Omotuyi, Idowu, Isewon, Itunuoluwa, Nash, Oyekanmi, Soremekun, Opeyemi, and Fatumo, Segun
- Subjects
- *
DIABETES , *PROTEIN overexpression , *POST-translational modification , *STRUCTURAL bioinformatics , *DRUG analysis - Abstract
O-GlcNAcylation is a nutrient-sensing post-translational modification process. This cycling process involves two primary proteins: the O-linked N-acetylglucosamine transferase (OGT) catalysing the addition, and the glycoside hydrolase OGA (O-GlcNAcase) catalysing the removal of the O-GlCNAc moiety on nucleocytoplasmic proteins. This process is necessary for various critical cellular functions. The O-linked N-acetylglucosamine transferase (OGT) gene produces the OGT protein. Several studies have shown the overexpression of this protein to have biological implications in metabolic diseases like cancer and diabetes mellitus (DM). This study retrieved 159 SNPs with clinical significance from the SNPs database. We probed the functional effects, stability profile, and evolutionary conservation of these to determine their fit for this research. We then identified 7 SNPs (G103R, N196K, Y228H, R250C, G341V, L367F, and C845S) with predicted deleterious effects across the four tools used (PhD-SNPs, SNPs&Go, PROVEAN, and PolyPhen2). Proceeding with this, we used ROBETTA, a homology modelling tool, to model the proteins with these point mutations and carried out a structural bioinformatics method– molecular docking– using the Glide model of the Schrodinger Maestro suite. We used a previously reported inhibitor of OGT, OSMI-1, as the ligand for these mutated protein models. As a result, very good binding affinities and interactions were observed between this ligand and the active site residues within 4Å of OGT. We conclude that these mutation points may be used for further downstream analysis as drug targets for treating diabetes mellitus. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
3. BeEM: fast and faithful conversion of mmCIF format structure files to PDB format.
- Author
-
Zhang, Chengxin
- Subjects
- *
DATABASES , *PROTEIN structure , *STRUCTURAL bioinformatics , *BANKING industry , *SOURCE code , *PATTERN matching , *SYNTHETIC biology - Abstract
Background: Although mmCIF is the current official format for deposition of protein and nucleic acid structures to the protein data bank (PDB) database, the legacy PDB format is still the primary supported format for many structural bioinformatics tools. Therefore, reliable software to convert mmCIF structure files to PDB files is needed. Unfortunately, existing conversion programs fail to correctly convert many mmCIF files, especially those with many atoms and/or long chain identifies. Results: This study proposed BeEM, which converts any mmCIF format structure files to PDB format. BeEM conversion faithfully retains all atomic and chain information, including chain IDs with more than 2 characters, which are not supported by any existing mmCIF to PDB converters. The conversion speed of BeEM is at least ten times faster than existing converters such as MAXIT and Phenix. Part of the reason for the speed improvement is the avoidance of conversion between numerical values and text strings. Conclusion: BeEM is a fast and accurate tool for mmCIF-to-PDB format conversion, which is a common procedure in structural biology. The source code is available under the BSD licence at https://github.com/kad-ecoli/BeEM/. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
4. 3DVizSNP: a tool for rapidly visualizing missense mutations identified in high throughput experiments in iCn3D.
- Author
-
Sierk, Michael, Ratnayake, Shashikala, Wagle, Manoj M., Chen, Ben, Park, Brian, Wang, Jiyao, Youkharibache, Philippe, and Meerzaman, Daoud
- Subjects
- *
MISSENSE mutation , *SINGLE nucleotide polymorphisms , *BANKING industry , *DATABASE management software , *DATABASES , *PYTHON programming language - Abstract
Background: High throughput experiments in cancer and other areas of genomic research identify large numbers of sequence variants that need to be evaluated for phenotypic impact. While many tools exist to score the likely impact of single nucleotide polymorphisms (SNPs) based on sequence alone, the three-dimensional structural environment is essential for understanding the biological impact of a nonsynonymous mutation. Results: We present a program, 3DVizSNP, that enables the rapid visualization of nonsynonymous missense mutations extracted from a variant caller format file using the web-based iCn3D visualization platform. The program, written in Python, leverages REST APIs and can be run locally without installing any other software or databases, or from a webserver hosted by the National Cancer Institute. It automatically selects the appropriate experimental structure from the Protein Data Bank, if available, or the predicted structure from the AlphaFold database, enabling users to rapidly screen SNPs based on their local structural environment. 3DVizSNP leverages iCn3D annotations and its structural analysis functions to assess changes in structural contacts associated with mutations. Conclusions: This tool enables researchers to efficiently make use of 3D structural information to prioritize mutations for further computational and experimental impact assessment. The program is available as a webserver at https://analysistools.cancer.gov/3dvizsnp or as a standalone python program at https://github.com/CBIIT-CGBB/3DVizSNP. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
5. Biotite: new tools for a versatile Python bioinformatics library.
- Author
-
Kunzmann, Patrick, Müller, Tom David, Greil, Maximilian, Krumbach, Jan Hendrik, Anter, Jacob Marcel, Bauer, Daniel, Islam, Faisal, and Hamacher, Kay
- Abstract
Background: Biotite is a program library for sequence and structural bioinformatics written for the Python programming language. It implements widely used computational methods into a consistent and accessible package. This allows for easy combination of various data analysis, modeling and simulation methods. Results: This article presents major functionalities introduced into Biotite since its original publication. The fields of application are shown using concrete examples. We show that the computational performance of Biotite for bioinformatics tasks is comparable to individual, special purpose software systems specifically developed for the respective single task. Conclusions: The results show that Biotite can be used as program library to either answer specific bioinformatics questions and simultaneously allow the user to write entire, self-contained software applications with sufficient performance for general application. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
6. Visualizing the knowledge structure and evolution of bioinformatics.
- Author
-
Wang, Jiaqi, Li, Zeyu, and Zhang, Jiawan
- Subjects
- *
STRUCTURAL bioinformatics , *ETHNOBIOLOGY , *DATA mining , *MACHINE learning , *BIBLIOMETRICS , *BIOINFORMATICS - Abstract
Background: Bioinformatics has gained much attention as a fast growing interdisciplinary field. Several attempts have been conducted to explore the field of bioinformatics by bibliometric analysis, however, such works did not elucidate the role of visualization in analysis, nor focus on the relationship between sub-topics of bioinformatics. Results: First, the hotspot of bioinformatics has moderately shifted from traditional molecular biology to omics research, and the computational method has also shifted from mathematical model to data mining and machine learning. Second, DNA-related topics are bridge topics in bioinformatics research. These topics gradually connect various sub-topics that are relatively independent at first. Third, only a small part of topics we have obtained involves a number of computational methods, and the other topics focus more on biological aspects. Fourth, the proportion of computing-related topics hit a trough in the 1980s. During this period, the use of traditional calculation methods such as mathematical model declined in a large proportion while the new calculation methods such as machine learning have not been applied in a large scale. This proportion began to increase gradually after the 1990s. Fifth, although the proportion of computing-related topics is only slightly higher than the original, the connection between other topics and computing-related topics has become closer, which means the support of computational methods is becoming increasingly important for the research of bioinformatics. Conclusions: The results of our analysis imply that research on bioinformatics is becoming more diversified and the ranking of computational methods in bioinformatics research is also gradually improving. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
7. Expanding the clinical-pathological and genetic spectrum of RYR1-related congenital myopathies with cores and minicores: an Italian population study.
- Author
-
Fusto, Aurora, Cassandrini, Denise, Fiorillo, Chiara, Codemo, Valentina, Astrea, Guja, D'Amico, Adele, Maggi, Lorenzo, Magri, Francesca, Pane, Marika, Tasca, Giorgio, Sabbatini, Daniele, Bello, Luca, Battini, Roberta, Bernasconi, Pia, Fattori, Fabiana, Bertini, Enrico Silvio, Comi, Giacomo, Messina, Sonia, Mongini, Tiziana, and Moroni, Isabella
- Subjects
- *
MUSCLE weakness , *MUSCLE diseases , *RYANODINE receptors , *STRUCTURAL bioinformatics , *SYMPTOMS , *MISSENSE mutation - Abstract
Mutations in the RYR1 gene, encoding ryanodine receptor 1 (RyR1), are a well-known cause of Central Core Disease (CCD) and Multi-minicore Disease (MmD). We screened a cohort of 153 patients carrying an histopathological diagnosis of core myopathy (cores and minicores) for RYR1 mutation. At least one RYR1 mutation was identified in 69 of them and these patients were further studied. Clinical and histopathological features were collected. Clinical phenotype was highly heterogeneous ranging from asymptomatic or paucisymptomatic hyperCKemia to severe muscle weakness and skeletal deformity with loss of ambulation. Sixty-eight RYR1 mutations, generally missense, were identified, of which 16 were novel. The combined analysis of the clinical presentation, disease progression and the structural bioinformatic analyses of RYR1 allowed to associate some phenotypes to mutations in specific domains. In addition, this study highlighted the structural bioinformatics potential in the prediction of the pathogenicity of RYR1 mutations. Further improvement in the comprehension of genotype–phenotype relationship of core myopathies can be expected in the next future: the actual lack of the human RyR1 crystal structure paired with the presence of large intrinsically disordered regions in RyR1, and the frequent presence of more than one RYR1 mutation in core myopathy patients, require designing novel investigation strategies to completely address RyR1 mutation effect. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
8. Comparison of carbohydrate ABC importers from Mycobacterium tuberculosis.
- Author
-
De la Torre, Lilia I., Vergara Meza, José G., Cabarca, Sindy, Costa-Martins, André G., and Balan, Andrea
- Subjects
- *
MYCOBACTERIUM tuberculosis , *ATP-binding cassette transporters , *MYCOBACTERIA , *STRUCTURAL bioinformatics , *PROTEIN structure , *IMPORTERS , *DRUG development - Abstract
Background: Mycobacterium tuberculosis, the etiological agent of tuberculosis, has at least four ATP-Binding Cassette (ABC) transporters dedicated to carbohydrate uptake: LpqY/SugABC, UspABC, Rv2038c-41c, and UgpAEBC. LpqY/SugABC transporter is essential for M. tuberculosis survival in vivo and potentially involved in the recycling of cell wall components. The three-dimensional structures of substrate-binding proteins (SBPs) LpqY, UspC, and UgpB were described, however, questions about how these proteins interact with the cognate transporter are still being explored. Components of these transporters, such as SBPs, show high immunogenicity and could be used for the development of diagnostic and therapeutic tools. In this work, we used a phylogenetic and structural bioinformatics approach to compare the four systems, in an attempt to predict functionally important regions. Results: Through the analysis of the putative orthologs of the carbohydrate ABC importers in species of Mycobacterium genus it was shown that Rv2038c-41c and UgpAEBC systems are restricted to pathogenic species. We showed that the components of the four ABC importers are phylogenetically separated into four groups defined by structural differences in regions that modulate the functional activity or the interaction with domain partners. The regulatory region in nucleotide-binding domains, the periplasmic interface in transmembrane domains and the ligand-binding pocket of the substrate-binding proteins define their substrates and segregation in different branches. The interface between transmembrane domains and nucleotide-binding domains show conservation of residues and charge. Conclusions: The presence of four ABC transporters in M. tuberculosis dedicated to uptake and transport of different carbohydrate sources, and the exclusivity of at least two of them being present only in pathogenic species of Mycobacterium genus, highlights their relevance in virulence and pathogenesis. The significant differences in the SBPs, not present in eukaryotes, and in the regulatory region of NBDs can be explored for the development of inhibitory drugs targeting the bacillus. The possible promiscuity of NBDs also contributes to a less specific and more comprehensive control approach. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
9. Fast and accurate structure probability estimation for simultaneous alignment and folding of RNAs with Markov chains.
- Author
-
Miladi, Milad, Raden, Martin, Will, Sebastian, and Backofen, Rolf
- Subjects
- *
MARKOV processes , *BASE pairs , *ALGORITHMS , *RNA , *NON-coding RNA - Abstract
Motivation: Simultaneous alignment and folding (SA&F) of RNAs is the indispensable gold standard for inferring the structure of non-coding RNAs and their general analysis. The original algorithm, proposed by Sankoff, solves the theoretical problem exactly with a complexity of O (n 6) in the full energy model. Over the last two decades, several variants and improvements of the Sankoff algorithm have been proposed to reduce its extreme complexity by proposing simplified energy models or imposing restrictions on the predicted alignments. Results: Here, we introduce a novel variant of Sankoff's algorithm that reconciles the simplifications of PMcomp, namely moving from the full energy model to a simpler base pair-based model, with the accuracy of the loop-based full energy model. Instead of estimating pseudo-energies from unconditional base pair probabilities, our model calculates energies from conditional base pair probabilities that allow to accurately capture structure probabilities, which obey a conditional dependency. This model gives rise to the fast and highly accurate novel algorithm Pankov (Probabilistic Sankoff-like simultaneous alignment and folding of RNAs inspired by Markov chains). Conclusions: Pankov benefits from the speed-up of excluding unreliable base-pairing without compromising the loop-based free energy model of the Sankoff's algorithm. We show that Pankov outperforms its predecessors LocARNA and SPARSE in folding quality and is faster than LocARNA. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
10. Computational models of melanoma.
- Author
-
Albrecht, Marco, Lucarelli, Philippe, Kulms, Dagmar, and Sauter, Thomas
- Subjects
COMPUTATIONAL biology ,MELANOMA ,STRUCTURAL bioinformatics ,DESCRIPTIVE statistics ,EXPERIMENTAL medicine ,SYSTEMS biology ,BIOLOGY - Abstract
Genes, proteins, or cells influence each other and consequently create patterns, which can be increasingly better observed by experimental biology and medicine. Thereby, descriptive methods of statistics and bioinformatics sharpen and structure our perception. However, additionally considering the interconnectivity between biological elements promises a deeper and more coherent understanding of melanoma. For instance, integrative network-based tools and well-grounded inductive in silico research reveal disease mechanisms, stratify patients, and support treatment individualization. This review gives an overview of different modeling techniques beyond statistics, shows how different strategies align with the respective medical biology, and identifies possible areas of new computational melanoma research. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
11. Self-analysis of repeat proteins reveals evolutionarily conserved patterns.
- Author
-
Merski, Matthew, Młynarczyk, Krzysztof, Ludwiczak, Jan, Skrzeczkowski, Jakub, Dunin-Horkawicz, Stanisław, and Górna, Maria W.
- Subjects
- *
PROTEINS , *CONVERGENT evolution , *PROTEIN analysis , *SEQUENCE analysis - Abstract
Background: Protein repeats can confound sequence analyses because the repetitiveness of their amino acid sequences lead to difficulties in identifying whether similar repeats are due to convergent or divergent evolution. We noted that the patterns derived from traditional "dot plot" protein sequence self-similarity analysis tended to be conserved in sets of related repeat proteins and this conservation could be quantitated using a Jaccard metric. Results: Comparison of these dot plots obviated the issues due to sequence similarity for analysis of repeat proteins. A high Jaccard similarity score was suggestive of a conserved relationship between closely related repeat proteins. The dot plot patterns decayed quickly in the absence of selective pressure with an expected loss of 50% of Jaccard similarity due to a loss of 8.2% sequence identity. To perform method testing, we assembled a standard set of 79 repeat proteins representing all the subgroups in RepeatsDB. Comparison of known repeat and non-repeat proteins from the PDB suggested that the information content in dot plots could be used to identify repeat proteins from pure sequence with no requirement for structural information. Analysis of the UniRef90 database suggested that 16.9% of all known proteins could be classified as repeat proteins. These 13.3 million putative repeat protein chains were clustered and a significant amount (82.9%) of clusters containing between 5 and 200 members were of a single functional type. Conclusions: Dot plot analysis of repeat proteins attempts to obviate issues that arise due to the sequence degeneracy of repeat proteins. These results show that this kind of analysis can efficiently be applied to analyze repeat proteins on a large scale. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
12. Benchmarking the PEPOP methods for mimicking discontinuous epitopes.
- Author
-
Demolombe, Vincent, de Brevern, Alexandre G., Molina, Franck, Lavigne, Géraldine, Granier, Claude, and Moreau, Violaine
- Subjects
- *
TRAVELING salesman problem , *BENCHMARKING (Management) , *X-ray crystallography , *PROTEIN-protein interactions , *EPITOPES - Abstract
Background: Computational methods provide approaches to identify epitopes in protein Ags to help characterizing potential biomarkers identified by high-throughput genomic or proteomic experiments. PEPOP version 1.0 was developed as an antigenic or immunogenic peptide prediction tool. We have now improved this tool by implementing 32 new methods (PEPOP version 2.0) to guide the choice of peptides that mimic discontinuous epitopes and thus potentially able to replace the cognate protein Ag in its interaction with an Ab. In the present work, we describe these new methods and the benchmarking of their performances. Results: Benchmarking was carried out by comparing the peptides predicted by the different methods and the corresponding epitopes determined by X-ray crystallography in a dataset of 75 Ag-Ab complexes. The Sensitivity (Se) and Positive Predictive Value (PPV) parameters were used to assess the performance of these methods. The results were compared to that of peptides obtained either by chance or by using the SUPERFICIAL tool, the only available comparable method. Conclusion: The PEPOP methods were more efficient than, or as much as chance, and 33 of the 34 PEPOP methods performed better than SUPERFICIAL. Overall, "optimized" methods (tools that use the traveling salesman problem approach to design peptides) can predict peptides that best match true epitopes in most cases. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
13. MADOKA: an ultra-fast approach for large-scale protein structure similarity searching.
- Author
-
Deng, Lei, Zhong, Guolun, Liu, Chenzhe, Luo, Judong, and Liu, Hui
- Subjects
- *
INTERNET servers , *PROTEIN structure , *STRUCTURAL bioinformatics , *PROTEIN analysis , *DATA structures , *WEB search engines , *AMINO acid sequence - Abstract
Background: Protein comparative analysis and similarity searches play essential roles in structural bioinformatics. A couple of algorithms for protein structure alignments have been developed in recent years. However, facing the rapid growth of protein structure data, improving overall comparison performance and running efficiency with massive sequences is still challenging. Results: Here, we propose MADOKA, an ultra-fast approach for massive structural neighbor searching using a novel two-phase algorithm. Initially, we apply a fast alignment between pairwise structures. Then, we employ a score to select pairs with more similarity to carry out a more accurate fragment-based residue-level alignment. MADOKA performs about 6–100 times faster than existing methods, including TM-align and SAL, in massive alignments. Moreover, the quality of structural alignment of MADOKA is better than the existing algorithms in terms of TM-score and number of aligned residues. We also develop a web server to search structural neighbors in PDB database (About 360,000 protein chains in total), as well as additional features such as 3D structure alignment visualization. The MADOKA web server is freely available at: http://madoka.denglab.org/ Conclusions: MADOKA is an efficient approach to search for protein structure similarity. In addition, we provide a parallel implementation of MADOKA which exploits massive power of multi-core CPUs. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
14. Predicting RNA secondary structure via adaptive deep recurrent neural networks with energy-based filter.
- Author
-
Lu, Weizhong, Tang, Ye, Wu, Hongjie, Huang, Hongmei, Fu, Qiming, Qiu, Jing, and Li, Haiou
- Subjects
- *
RECURRENT neural networks , *SMART structures , *RNA , *BASE pairs , *STRUCTURAL bioinformatics , *NUMERIC databases - Abstract
Background: RNA secondary structure prediction is an important issue in structural bioinformatics, and RNA pseudoknotted secondary structure prediction represents an NP-hard problem. Recently, many different machine-learning methods, Markov models, and neural networks have been employed for this problem, with encouraging results regarding their predictive accuracy; however, their performances are usually limited by the requirements of the learning model and over-fitting, which requires use of a fixed number of training features. Because most natural biological sequences have variable lengths, the sequences have to be truncated before the features are employed by the learning model, which not only leads to the loss of information but also destroys biological-sequence integrity. Results: To address this problem, we propose an adaptive sequence length based on deep-learning model and integrate an energy-based filter to remove the over-fitting base pairs. Conclusions: Comparative experiments conducted on an authoritative dataset RNA STRAND (RNA secondary STRucture and statistical Analysis Database) revealed a 12% higher accuracy relative to three currently used methods. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
15. An algebraic language for RNA pseudoknots comparison.
- Author
-
Quadrini, Michela, Tesei, Luca, and Merelli, Emanuela
- Subjects
- *
MOLECULAR structure of RNA , *PROTEIN structure , *STRUCTURAL bioinformatics , *INTEGER programming , *PROTEIN models , *ARBITRARY constants - Abstract
Background: RNA secondary structure comparison is a fundamental task for several studies, among which are RNA structure prediction and evolution. The comparison can currently be done efficiently only for pseudoknot-free structures due to their inherent tree representation. Results: In this work, we introduce an algebraic language to represent RNA secondary structures with arbitrary pseudoknots. Each structure is associated with a unique algebraic RNA tree that is derived from a tree grammar having concatenation, nesting and crossing as operators. From an algebraic RNA tree, an abstraction is defined in which the primary structure is neglected. The resulting structural RNA tree allows us to define a new measure of similarity calculated exploiting classical tree alignment. Conclusions: The tree grammar with its operators permit to uniquely represent any RNA secondary structure as a tree. Structural RNA trees allow us to perform comparison of RNA secondary structures with arbitrary pseudoknots without taking into account the primary structure. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
16. Biotite: a unifying open source computational biology framework in Python.
- Author
-
Kunzmann, Patrick and Hamacher, Kay
- Subjects
- *
OPEN source software , *COMPUTATIONAL biology , *BIOLOGICAL databases , *ELECTRONIC data processing , *SEQUENCE analysis , *STRUCTURAL bioinformatics , *COMPUTER software - Abstract
Background: As molecular biology is creating an increasing amount of sequence and structure data, the multitude of software to analyze this data is also rising. Most of the programs are made for a specific task, hence the user often needs to combine multiple programs in order to reach a goal. This can make the data processing unhandy, inflexible and even inefficient due to an overhead of read/write operations. Therefore, it is crucial to have a comprehensive, accessible and efficient computational biology framework in a scripting language to overcome these limitations. Results: We have developed the Python package Biotite: a general computational biology framework, that represents sequence and structure data based on NumPyndarrays. Furthermore the package contains seamless interfaces to biological databases and external software. The source code is freely accessible at https://github.com/biotite-dev/biotite. Conclusions: Biotite is unifying in two ways: At first it bundles popular tasks in sequence analysis and structural bioinformatics in a consistently structured package. Secondly it adresses two groups of users: novice programmers get an easy access to Biotite due to its simplicity and the comprehensive documentation. On the other hand, advanced users can profit from its high performance and extensibility. They can implement their algorithms upon Biotite, so they can skip writing code for general functionality (like file parsers) and can focus on what their software makes unique. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
17. LCS-TA to identify similar fragments in RNA 3D structures.
- Author
-
Wiedemann, Jakub, Zok, Tomasz, Milostan, Maciej, and Szachniuk, Marta
- Subjects
- *
RNA , *MOLECULAR evolution , *STRUCTURAL bioinformatics - Abstract
Background: In modern structural bioinformatics, comparison of molecular structures aimed to identify and assess similarities and differences between them is one of the most commonly performed procedures. It gives the basis for evaluation of in silico predicted models. It constitutes the preliminary step in searching for structural motifs. In particular, it supports tracing the molecular evolution. Faced with an ever-increasing amount of available structural data, researchers need a range of methods enabling comparative analysis of the structures from either global or local perspective. Results: Herein, we present a new, superposition-independent method which processes pairs of RNA 3D structures to identify their local similarities. The similarity is considered in the context of structure bending and bonds' rotation which are described by torsion angles. In the analyzed RNA structures, the method finds the longest continuous segments that show similar torsion within a user-defined threshold. The length of the segment is provided as local similarity measure. The method has been implemented as LCS-TA algorithm (Longest Continuous Segments in Torsion Angle space) and is incorporated into our MCQ4Structures application, freely available for download from http://www. cs.put.poznan.pl/tzok/mcq/. Conclusions: The presented approach ties torsion-angle-based method of structure analysis with the idea of local similarity identification by handling continuous 3D structure segments. The first method, implemented in MCQ4Structures, has been successfully utilized in RNA-Puzzles initiative. The second one, originally applied in Euclidean space, is a component of LGA (Local-Global Alignment) algorithm commonly used in assessing protein models submitted to CASP. This unique combination of concepts implemented in LCS-TA provides a new perspective on structure quality assessment in local and quantitative aspect. A series of computational experiments show the first results of applying our method to comparison of RNA 3D models. LCS-TA can be used for identifying strengths and weaknesses in the prediction of RNA tertiary structures. [ABSTRACT FROM AUTHOR]
- Published
- 2017
- Full Text
- View/download PDF
18. Incorporating biological information in sparse principal component analysis with application to genomic data.
- Author
-
Ziyi Li, Safo, Sandra E., and Qi Long
- Subjects
- *
PRINCIPAL components analysis , *GENETIC databases , *PATTERN recognition systems , *GENE expression , *STRUCTURAL bioinformatics - Abstract
Background: Sparse principal component analysis (PCA) is a popular tool for dimensionality reduction, pattern recognition, and visualization of high dimensional data. It has been recognized that complex biological mechanisms occur through concerted relationships of multiple genes working in networks that are often represented by graphs. Recent work has shown that incorporating such biological information improves feature selection and prediction performance in regression analysis, but there has been limited work on extending this approach to PCA. In this article, we propose two new sparse PCA methods called Fused and Grouped sparse PCA that enable incorporation of prior biological information in variable selection. Results: Our simulation studies suggest that, compared to existing sparse PCA methods, the proposed methods achieve higher sensitivity and specificity when the graph structure is correctly specified, and are fairly robust to misspecified graph structures. Application to a glioblastoma gene expression dataset identified pathways that are suggested in the literature to be related with glioblastoma. Conclusions: The proposed sparse PCA methods Fused and Grouped sparse PCA can effectively incorporate prior biological information in variable selection, leading to improved feature selection and more interpretable principal component loadings and potentially providing insights on molecular underpinnings of complex diseases. [ABSTRACT FROM AUTHOR]
- Published
- 2017
- Full Text
- View/download PDF
19. 3D deep convolutional neural networks for amino acid environment similarity analysis.
- Author
-
Wen Torng and Altman, Russ B.
- Subjects
- *
ARTIFICIAL neural networks , *DEEP learning , *PROTEIN analysis , *PROTEIN structure , *STRUCTURAL bioinformatics - Abstract
Background: Central to protein biology is the understanding of how structural elements give rise to observed function. The surfeit of protein structural data enables development of computational methods to systematically derive rules governing structural-functional relationships. However, performance of these methods depends critically on the choice of protein structural representation. Most current methods rely on features that are manually selected based on knowledge about protein structures. These are often general-purpose but not optimized for the specific application of interest. In this paper, we present a general framework that applies 3D convolutional neural network (3DCNN) technology to structure-based protein analysis. The framework automatically extracts task-specific features from the raw atom distribution, driven by supervised labels. As a pilot study, we use our network to analyze local protein microenvironments surrounding the 20 amino acids, and predict the amino acids most compatible with environments within a protein structure. To further validate the power of our method, we construct two amino acid substitution matrices from the prediction statistics and use them to predict effects of mutations in T4 lysozyme structures. Results: Our deep 3DCNN achieves a two-fold increase in prediction accuracy compared to models that employ conventional hand-engineered features and successfully recapitulates known information about similar and different microenvironments. Models built from our predictions and substitution matrices achieve an 85% accuracy predicting outcomes of the T4 lysozyme mutation variants. Our substitution matrices contain rich information relevant to mutation analysis compared to well-established substitution matrices. Finally, we present a visualization method to inspect the individual contributions of each atom to the classification decisions. Conclusions: End-to-end trained deep learning networks consistently outperform methods using hand-engineered features, suggesting that the 3DCNN framework is well suited for analysis of protein microenvironments and may be useful for other protein structural analyses. [ABSTRACT FROM AUTHOR]
- Published
- 2017
- Full Text
- View/download PDF
20. Across-proteome modeling of dimer structures for the bottom-up assembly of protein-protein interaction networks.
- Author
-
Maheshwari, Surabhi and Brylinski, Michal
- Subjects
- *
PROTEIN-protein interactions , *STRUCTURAL bioinformatics , *MACHINE learning , *GENE ontology , *DIMERS , *PROTEOMICS - Abstract
Background: Deciphering complete networks of interactions between proteins is the key to comprehend cellular regulatory mechanisms. A significant effort has been devoted to expanding the coverage of the proteome-wide interaction space at molecular level. Although a growing body of research shows that protein docking can, in principle, be used to predict biologically relevant interactions, the accuracy of the across-proteome identification of interacting partners and the selection of near-native complex structures still need to be improved. Results: In this study, we developed a new method to discover and model protein interactions employing an exhaustive all-to-all docking strategy. This approach integrates molecular modeling, structural bioinformatics, machine learning, and functional annotation filters in order to provide interaction data for the bottom-up assembly of protein interaction networks. Encouragingly, the success rates for dimer modeling is 57.5 and 48.7% when experimental and computer-generated monomer structures are employed, respectively. Further, our protocol correctly identifies 81% of protein-protein interactions at the expense of only 19% false positive rate. As a proof of concept, 61,913 protein-protein interactions were confidently predicted and modeled for the proteome of E. coli. Finally, we validated our method against the human immune disease pathway. Conclusions: Protein docking supported by evolutionary restraints and machine learning can be used to reliably identify and model biologically relevant protein assemblies at the proteome scale. Moreover, the accuracy of the identification of protein-protein interactions is improved by considering only those protein pairs co-localized in the same cellular compartment and involved in the same biological process. The modeling protocol described in this communication can be applied to detect protein-protein interactions in other organisms and pathways as well as to construct dimer structures and estimate the confidence of protein interactions experimentally identified with high-throughput techniques. [ABSTRACT FROM AUTHOR]
- Published
- 2017
- Full Text
- View/download PDF
21. GPCRs from fusarium graminearum detection, modeling and virtual screening - the search for new routes to control head blight disease.
- Author
-
Bresso, Emmanuel, Togawa, Roberto, Hammond-Kosack, Kim, Urban, Martin, Maigret, Bernard, and Martins, Natalia Florencio
- Subjects
- *
G protein coupled receptors , *PATHOGENIC microorganisms , *ANIMAL health , *FUNGICIDES , *FOOD contamination prevention - Abstract
Backgound: Fusarium graminearum (FG) is one of the major cereal infecting pathogens causing high economic losses worldwide and resulting in adverse effects on human and animal health. Therefore, the development of new fungicides against FG is an important issue to reduce cereal infection and economic impact. In the strategy for developing new fungicides, a critical step is the identification of new targets against which innovative chemicals weapons can be designed. As several G-protein coupled receptors (GPCRs) are implicated in signaling pathways critical for the fungi development and survival, such proteins could be valuable efficient targets to reduce Fusarium growth and therefore to prevent food contamination. Results: In this study, GPCRs were predicted in the FG proteome using a manually curated pipeline dedicated to the identification of GPCRs. Based on several successive filters, the most appropriate GPCR candidate target for developing new fungicides was selected. Searching for new compounds blocking this particular target requires the knowledge of its 3D-structure. As no experimental X-Ray structure of the selected protein was available, a 3D model was built by homology modeling. The model quality and stability was checked by 100 ns of molecular dynamics simulations. Two stable conformations representative of the conformational families of the protein were extracted from the 100 ns simulation and were used for an ensemble docking campaign. The model quality and stability was checked by 100 ns of molecular dynamics simulations previously to the virtual screening step. The virtual screening step comprised the exploration of a chemical library with 11,000 compounds that were docked to the GPCR model. Among these compounds, we selected the ten top-ranked nontoxic molecules proposed to be experimentally tested to validate the in silico simulation. Conclusions: This study provides an integrated process merging genomics, structural bioinformatics and drug design for proposing innovative solutions to a world wide threat to grain producers and consumers. [ABSTRACT FROM AUTHOR]
- Published
- 2016
- Full Text
- View/download PDF
22. LENS: web-based lens for enrichment and network studies of human proteins.
- Author
-
Handen, Adam and Ganapathiraju, Madhavi K.
- Subjects
- *
PROTEIN analysis , *MEDICAL genetics , *COMPUTERS in biology , *GENOMIC information retrieval , *STRUCTURAL bioinformatics , *PROTEIN-protein interactions - Abstract
Background: Network analysis is a common approach for the study of genetic view of diseases and biological pathways. Typically, when a set of genes are identified to be of interest in relation to a disease, say through a genome wide association study (GWAS) or a different gene expression study, these genes are typically analyzed in the context of their protein-protein interaction (PPI) networks. Further analysis is carried out to compute the enrichment of known pathways and disease-associations in the network. Having tools for such analysis at the fingertips of biologists without the requirement for computer programming or curation of data would accelerate the characterization of genes of interest. Currently available tools do not integrate network and enrichment analysis and their visualizations, and most of them present results in formats not most conducive to human cognition. Results: We developed the tool Lens for Enrichment and Network Studies of human proteins (LENS) that performs network and pathway and diseases enrichment analyses on genes of interest to users. The tool creates a visualization of the network, provides easy to read statistics on network connectivity, and displays Venn diagrams with statistical significance values of the network's association with drugs, diseases, pathways, and GWASs. We used the tool to analyze gene sets related to craniofacial development, autism, and schizophrenia. Conclusion: LENS is a web-based tool that does not require and download or plugins to use. The tool is free and does not require login for use, and is available at http://severus.dbmi.pitt.edu/LENS. [ABSTRACT FROM AUTHOR]
- Published
- 2015
- Full Text
- View/download PDF
23. Quantifying conformational changes in GPCRs: glimpse of a common functional mechanism.
- Author
-
Dalton, James A. R., Lans, Isaias, and Giraldo, Jesús
- Subjects
- *
G proteins , *MOLECULAR structure , *SPECTRUM analysis , *STRUCTURAL bioinformatics , *CRYSTALLIZATION - Abstract
Background: G-protein-coupled receptors (GPCRs) are important drug targets and a better understanding of their molecular mechanisms would be desirable. The crystallization rate of GPCRs has accelerated in recent years as techniques have become more sophisticated, particularly with respect to Class A GPCRs interacting with G-proteins. These developments have made it possible for a quantitative analysis of GPCR geometrical features and binding-site conformations, including a statistical comparison between Class A GPCRs in active (agonist-bound) and inactive (antagonist-bound) states. Results: Here we implement algorithms for the analysis of interhelical angles, distances, interactions and binding-site volumes in the transmembrane domains of 25 Class A GPCRs (7 active and 18 inactive). Two interhelical angles change in a statistically significant way between average inactive and active states: TM3-TM6 (by -9°) and TM6-TM7 (by +12°). A third interhelical angle: TM5-TM6 shows a trend, changing by -9°. In the transition from inactive to active states, average van der Waals interactions between TM3 and TM7 significantly increase as the average distance between them decreases by >2 Å. Average H-bonding between TM3 and TM6 decreases but is seemingly compensated by an increase in H-bonding between TM5 and TM6. In five Class A GPCRs, crystallized in both active and inactive states, increased H-bonding of agonists to TM6 and TM7, relative to antagonists, is observed. These protein-agonist interactions likely favour a change in the TM6-TM7 angle, which creates a narrowing in the binding pocket of activated receptors and an average ∼200 Å3 reduction in volume. Conclusions: In terms of similar conformational changes and agonist binding pattern, Class A GPCRs appear to share a common mechanism of activation, which can be exploited in future drug development. [ABSTRACT FROM AUTHOR]
- Published
- 2015
- Full Text
- View/download PDF
24. A method of searching for related literature on protein structure analysis by considering a user's intention.
- Author
-
Ito, Azusa and Ohkawa, Takenao
- Subjects
- *
PROTEINS , *GENE ontology , *PROTEIN structure , *BIOINFORMATICS , *STRUCTURAL bioinformatics - Abstract
Background: In recent years, with advances in techniques for protein structure analysis, the knowledge about protein structure and function has been published in a vast number of articles. A method to search for specific publications from such a large pool of articles is needed. In this paper, we propose a method to search for related articles on protein structure analysis by using an article itself as a query. Results: Each article is represented as a set of concepts in the proposed method. Then, by using similarities among concepts formulated from databases such as Gene Ontology, similarities between articles are evaluated. In this framework, the desired search results vary depending on the user's search intention because a variety of information is included in a single article. Therefore, the proposed method provides not only one input article (primary article) but also additional articles related to it as an input query to determine the search intention of the user, based on the relationship between two query articles. In other words, based on the concepts contained in the input article and additional articles, we actualize a relevant literature search that considers user intention by varying the degree of attention given to each concept and modifying the concept hierarchy graph. Conclusions: We performed an experiment to retrieve relevant papers from articles on protein structure analysis registered in the Protein Data Bank by using three query datasets. The experimental results yielded search results with better accuracy than when user intention was not considered, confirming the effectiveness of the proposed method. [ABSTRACT FROM AUTHOR]
- Published
- 2015
- Full Text
- View/download PDF
25. PCalign: a method to quantify physicochemical similarity of protein-protein interfaces.
- Author
-
Shanshan Cheng, Yang Zhang, and Brooks III, Charles L.
- Subjects
- *
BIOLOGICAL interfaces , *PROTEIN-protein interactions , *ALGORITHMS , *GEOMETRIC approach , *HASHING - Abstract
Background: Structural comparison of protein-protein interfaces provides valuable insights into the functional relationship between proteins, which may not solely arise from shared evolutionary origin. A few methods that exist for such comparative studies have focused on structural models determined at atomic resolution, and may miss out interesting patterns present in large macromolecular complexes that are typically solved by low-resolution techniques. Results: We developed a coarse-grained method, PCalign, to quantitatively evaluate physicochemical similarities between a given pair of protein-protein interfaces. This method uses an order-independent algorithm, geometric hashing, to superimpose the backbone atoms of a given pair of interfaces, and provides a normalized scoring function, PC-score, to account for the extent of overlap in terms of both geometric and chemical characteristics. We demonstrate that PCalign outperforms existing methods, and additionally facilitates comparative studies across models of different resolutions, which are not accommodated by existing methods. Furthermore, we illustrate potential application of our method to recognize interesting biological relationships masked by apparent lack of structural similarity. Conclusions: PCalign is a useful method in recognizing shared chemical and spatial patterns among protein- protein interfaces. It outperforms existing methods for high-quality data, and additionally facilitates comparison across structural models with different levels of details with proven robustness against noise. [ABSTRACT FROM AUTHOR]
- Published
- 2015
- Full Text
- View/download PDF
26. Mining the entire Protein DataBank for frequent spatially cohesive amino acid patterns.
- Author
-
Meysman, Pieter, Cheng Zhou, Cule, Boris, Goethals, Bart, and Laukens, Kris
- Subjects
- *
AMINO acid sequence , *PROTEIN structure , *AMINO acid residues , *STRUCTURAL bioinformatics , *DNA-protein interactions - Abstract
Background: The three-dimensional structure of a protein is an essential aspect of its functionality. Despite the large diversity in protein structures and functionality, it is known that there are common patterns and preferences in the contacts between amino acid residues, or between residues and other biomolecules, such as DNA. The discovery and characterization of these patterns is an important research topic within structural biology as it can give fundamental insight into protein structures and can aid in the prediction of unknown structures. Results: Here we apply an efficient spatial pattern miner to search for sets of amino acids that occur frequently in close spatial proximity in the protein structures of the Protein DataBank. This allowed us to mine for a new class of amino acid patterns, that we term FreSCOs (Frequent Spatially Cohesive Component sets), which feature synergetic combinations. To demonstrate the relevance of these FreSCOs, they were compared in relation to the thermostability of the protein structure and the interaction preferences of DNA-protein complexes. In both cases, the results matched well with prior investigations using more complex methods on smaller data sets. Conclusions: The currently characterized protein structures feature a diverse set of frequent amino acid patterns that can be related to the stability of the protein molecular structure and that are independent from protein function or specific conserved domains. [ABSTRACT FROM AUTHOR]
- Published
- 2015
- Full Text
- View/download PDF
27. PyTMs: a useful PyMOL plugin for modeling common post-translational modifications.
- Author
-
Warnecke, Andreas, Sandalova, Tatyana, Achour, Adnane, and Harris, Robert A.
- Abstract
Background: Post-translational modifications (PTMs) constitute a major aspect of protein biology, particularly signaling events. Conversely, several different pathophysiological PTMs are hallmarks of oxidative imbalance or inflammatory states and are strongly associated with pathogenesis of autoimmune diseases or cancers. Accordingly, it is of interest to assess both the biological and structural effects of modification. For the latter, computer-based modeling offers an attractive option. We thus identified the need for easily applicable modeling options for PTMs. Results: We developed PyTMs, a plugin implemented with the commonly used visualization software PyMOL. PyTMs enables users to introduce a set of common PTMs into protein/peptide models and can be used to address research questions related to PTMs. Ten types of modification are currently supported, including acetylation, carbamylation, citrullination, cysteine oxidation, malondialdehyde adducts, methionine oxidation, methylation, nitration, proline hydroxylation and phosphorylation. Furthermore, advanced settings integrate the pre-selection of surface-exposed atoms, define stereochemical alternatives and allow for basic structure optimization of the newly modified residues. Conclusion: PyTMs is a useful, user-friendly modelling plugin for PyMOL. Advantages of PyTMs include standardized generation of PTMs, rapid time-to-result and facilitated user control. Although modeling cannot substitute for conventional structure determination it constitutes a convenient tool that allows uncomplicated exploration of potential implications prior to experimental investments and basic explanation of experimental data. PyTMs is freely available as part of the PyMOL script repository project on GitHub and will further evolve. [ABSTRACT FROM AUTHOR]
- Published
- 2014
- Full Text
- View/download PDF
28. iview: an interactive WebGL visualizer for protein-ligand complex.
- Author
-
Hongjian Li, Kwong-Sak Leung, Takanori Nakane, and Man-Hon Wong
- Subjects
- *
WEBGL (Computer program language) , *PROTEIN-ligand interactions , *DRUG design , *MACROMOLECULES , *VAN der Waals forces - Abstract
Background Visualization of protein-ligand complex plays an important role in elaborating protein-ligand interactions and aiding novel drug design. Most existing web visualizers either rely on slow software rendering, or lack virtual reality support. The vital feature of macromolecular surface construction is also unavailable. Results We have developed iview, an easy-to-use interactive WebGL visualizer of protein-ligand complex. It exploits hardware acceleration rather than software rendering. It features three special effects in virtual reality settings, namely anaglyph, parallax barrier and oculus rift, resulting in visually appealing identification of intermolecular interactions. It supports four surface representations including Van der Waals surface, solvent excluded surface, solvent accessible surface and molecular surface. Moreover, based on the feature-rich version of iview, we have also developed a neat and tailor-made version specifically for our istar web platform for protein-ligand docking purpose. This demonstrates the excellent portability of iview. Conclusions Using innovative 3D techniques, we provide a user friendly visualizer that is not intended to compete with professional visualizers, but to enable easy accessibility and platform independence. [ABSTRACT FROM AUTHOR]
- Published
- 2014
- Full Text
- View/download PDF
29. Understanding the evolutionary structural variability and target specificity of tick salivary Kunitz peptides using next generation transcriptome data.
- Author
-
Schwarz, Alexandra, Cruz, Alejandro Cabezas, Kopecký, Jan, and Valdés, James J.
- Subjects
- *
PEPTIDES , *GENETIC transcription , *IMMUNE response , *CHROMOSOME duplication , *BIOINFORMATICS - Abstract
Background Ticks are blood-sucking arthropods and a primary function of tick salivary proteins is to counteract the host's immune response. Tick salivary Kunitz-domain proteins perform multiple functions within the feeding lesion and have been classified as venoms; thereby, constituting them as one of the important elements in the arms race with the host. The two main mechanisms advocated to explain the functional heterogeneity of tick salivary Kunitzdomain proteins are gene sharing and gene duplication. Both do not, however, elucidate the evolution of the Kunitz family in ticks from a structural dynamic point of view. The Red Queen hypothesis offers a fruitful theoretical framework to give a dynamic explanation for host-parasite interactions. Using the recent salivary gland Ixodes ricinus transcriptome we analyze, for the first time, single Kunitz-domain encoding transcripts by means of computational, structural bioinformatics and phylogenetic approaches to improve our understanding of the structural evolution of this important multigenic protein family. Results Organizing the I. ricinus single Kunitz-domain peptides based on their cysteine motif allowed us to specify a putative target and to relate this target specificity to Illumina transcript reads during tick feeding. We observe that several of these Kunitz peptide groups vary in their translated amino acid sequence, secondary structure, antigenicity, and intrinsic disorder, and that the majority of these groups are subject to a purifying (negative) selection. We finalize by describing the evolution and emergence of these Kunitz peptides. The overall interpretation of our analyses discloses a rapidly emerging Kunitz group with a distinct disulfide bond pattern from the I. ricinus salivary gland transcriptome. Conclusions We propose a model to explain the structural and functional evolution of tick salivary Kunitz peptides that we call target-oriented evolution. Our study reveals that combining analytical approaches (transcriptomes, computational, bioinformatics and phylogenetics) improves our understanding of the biological functions of important salivary gland mediators during tick feeding. [ABSTRACT FROM AUTHOR]
- Published
- 2014
- Full Text
- View/download PDF
30. Influenza B virus has global ordered RNA structure in (+) and (-) strands but relatively less stable predicted RNA folding free energy than allowed by the encoded protein sequence.
- Author
-
Priore, Salvatore F., Moss, Walter N., and Turner, Douglas H.
- Subjects
- *
RNA , *INFLUENZA A virus , *INFLUENZA B virus , *STRUCTURAL bioinformatics , *NUCLEOPROTEINS , *AMINO acid sequence - Abstract
Background: Influenza A virus contributes to seasonal epidemics and pandemics and contains Global Ordered RNA structure (GORS) in the nucleoprotein (NP), non-structural (NS), PB2, and M segments. A related virus, influenza B, is also a major annual public health threat, but unlike influenza A is very selective to human hosts. This study extends the search for GORS to influenza B. Findings: A survey of all available influenza B sequences reveals GORS in the (+) and (-)RNAs of the NP, NS, PB2, and PB1 gene segments. The results are similar to influenza A, except GORS is observed for the M1 segment of influenza A but not for PB1. In general, the folding free energies of human-specific influenza B RNA segments are less stable than allowable by the encoded amino acid sequence. This is consistent with findings in influenza A, where human-specific influenza RNA folds are less stable than avian and swine strains. Conclusions: These results reveal fundamental molecular similarities and differences between Influenza A and B and suggest a rational basis for choosing segments to target with therapeutics and for viral attenuation for live vaccines by altering RNA folding stability. [ABSTRACT FROM AUTHOR]
- Published
- 2013
- Full Text
- View/download PDF
31. iPBAvizu: a PyMOL plugin for an efficient 3D protein structure superimposition approach
- Author
-
Faure, Guilhem, Joseph, Agnel Praveen, Craveur, Pierrick, Narwani, Tarun J., Srinivasan, Narayanaswamy, Gelly, Jean-Christophe, Rebehmed, Joseph, and de Brevern, Alexandre G.
- Published
- 2019
- Full Text
- View/download PDF
32. PEPOP 2.0: new approaches to mimic non-continuous epitopes
- Author
-
Demolombe, Vincent, de Brevern, Alexandre G., Felicori, Liza, NGuyen, Christophe, Machado de Avila, Ricardo Andrez, Valera, Lionel, Jardin-Watelet, Bénédicte, Lavigne, Géraldine, Lebreton, Aurélien, Molina, Franck, and Moreau, Violaine
- Published
- 2019
- Full Text
- View/download PDF
33. Interfacing medicinal chemistry with structural bioinformatics: implications for T box riboswitch RNA drug discovery.
- Author
-
Jentzsch, Franziska and Hines, Jennifer V.
- Subjects
- *
PHARMACEUTICAL chemistry , *STRUCTURAL bioinformatics , *RIBOSWITCHES , *TRANSFER RNA , *AMINOACYLATION - Abstract
Background: The T box riboswitch controls bacterial transcription by structurally responding to tRNA aminoacylation charging ratios. Knowledge of the thermodynamic stability difference between two competing structural elements within the riboswitch, the terminator and the antiterminator, is critical for effective T boxtargeted drug discovery. Methods: The ΔG of aminoacyl tRNA synthetase (aaRS) T box riboswitch terminators and antiterminators was predicted using DINAMelt and the resulting ΔΔG (ΔGTerminator - ΔGAntiterminator) values were compared. Results: Average ΔΔG values did not differ significantly between the bacterial species analyzed, but there were significant differences based on the type of aaRS. Conclusions: The data indicate that, of the bacteria studied, there is little potential for drug targeting based on overall bacteria-specific thermodynamic differences of the T box antiterminator vs. terminator stability, but that aaRS-specific thermodynamic differences could possibly be exploited for designing drug specificity. [ABSTRACT FROM AUTHOR]
- Published
- 2012
- Full Text
- View/download PDF
34. DockAnalyse: an application for the analysis of protein-protein interactions.
- Author
-
Amela, Isaac, Delicado, Pedro, Gómez, Antonio, Bonàs, Sílvia, Querol, Enrique, and Cedano, Juan
- Subjects
- *
PROTEIN-protein interactions , *STRUCTURAL bioinformatics , *PROTEOMICS , *PROTEIN structure , *CLUSTERING of particles - Abstract
Background: Is it possible to identify what the best solution of a docking program is? The usual answer to this question is the highest score solution, but interactions between proteins are dynamic processes, and many times the interaction regions are wide enough to permit protein-protein interactions with different orientations and/or interaction energies. In some cases, as in a multimeric protein complex, several interaction regions are possible among the monomers. These dynamic processes involve interactions with surface displacements between the proteins to finally achieve the functional configuration of the protein complex. Consequently, there is not a static and single solution for the interaction between proteins, but there are several important configurations that also have to be analyzed. Results: To extract those representative solutions from the docking output datafile, we have developed an unsupervised and automatic clustering application, named DockAnalyse. This application is based on the already existing DBscan clustering method, which searches for continuities among the clusters generated by the docking output data representation. The DBscan clustering method is very robust and, moreover, solves some of the inconsistency problems of the classical clustering methods like, for example, the treatment of outliers and the dependence of the previously defined number of clusters. Conclusions: DockAnalyse makes the interpretation of the docking solutions through graphical and visual representations easier by guiding the user to find the representative solutions. We have applied our new approach to analyze several protein interactions and model the dynamic protein interaction behavior of a protein complex. DockAnalyse might also be used to describe interaction regions between proteins and, therefore, guide future flexible dockings. The application (implemented in the R package) is accessible. [ABSTRACT FROM AUTHOR]
- Published
- 2010
- Full Text
- View/download PDF
35. Fast and accurate protein substructure searchingwith simulated annealing and GPUs.
- Author
-
Stivala, Alex D., Stuckey, Peter J., and Wirth, Anthony I.
- Subjects
- *
DATABASES , *PROTEIN structure , *GRAPHICS processing units , *SIMULATED annealing , *STRUCTURAL bioinformatics - Abstract
Background: Searching a database of protein structures for matches to a query structure, or occurrences of a structural motif, is an important task in structural biology and bioinformatics. While there are many existing methods for structural similarity searching, faster and more accurate approaches are still required, and few current methods are capable of substructure (motif) searching. Results: We developed an improved heuristic for tableau-based protein structure and substructure searching using simulated annealing, that is as fast or faster and comparable in accuracy, with some widely used existing methods. Furthermore, we created a parallel implementation on a modern graphics processing unit (GPU). Conclusions: The GPU implementation achieves up to 34 times speedup over the CPU implementation of tableaubased structure search with simulated annealing, making it one of the fastest available methods. To the best of our knowledge, this is the first application of a GPU to the protein structural search problem. [ABSTRACT FROM AUTHOR]
- Published
- 2010
- Full Text
- View/download PDF
36. CMASA: an accurate algorithm for detecting localprotein structural similarity and its application toenzyme catalytic site annotation.
- Author
-
Gong-Hua Li and Jing-Fei Huang
- Subjects
- *
ALGORITHMS , *GENOMICS , *PROTEIN structure , *DATABASES , *STRUCTURAL bioinformatics - Abstract
Background: The rapid development of structural genomics has resulted in many "unknown function" proteins being deposited in Protein Data Bank (PDB), thus, the functional prediction of these proteins has become a challenge for structural bioinformatics. Several sequence-based and structure-based methods have been developed to predict protein function, but these methods need to be improved further, such as, enhancing the accuracy, sensitivity, and the computational speed. Here, an accurate algorithm, the CMASA (Contact MAtrix based local Structural Alignment algorithm), has been developed to predict unknown functions of proteins based on the local protein structural similarity. This algorithm has been evaluated by building a test set including 164 enzyme families, and also been compared to other methods. Results: The evaluation of CMASA shows that the CMASA is highly accurate (0.96), sensitive (0.86), and fast enough to be used in the large-scale functional annotation. Comparing to both sequence-based and global structure-based methods, not only the CMASA can find remote homologous proteins, but also can find the active site convergence. Comparing to other local structure comparison-based methods, the CMASA can obtain the better performance than both FFF (a method using geometry to predict protein function) and SPASM (a local structure alignment method); and the CMASA is more sensitive than PINTS and is more accurate than JESS (both are local structure alignment methods). The CMASA was applied to annotate the enzyme catalytic sites of the nonredundant PDB, and at least 166 putative catalytic sites have been suggested, these sites can not be observed by the Catalytic Site Atlas (CSA). Conclusions: The CMASA is an accurate algorithm for detecting local protein structural similarity, and it holds several advantages in predicting enzyme active sites. The CMASA can be used in large-scale enzyme active site annotation. The CMASA can be available by the mail-based server (http://159.226.149.45/other1/CMASA/CMASA. htm). [ABSTRACT FROM AUTHOR]
- Published
- 2010
- Full Text
- View/download PDF
37. TOPSAN: a collaborative annotation environment for structural genomics.
- Author
-
Weekes, Dana, Krishna, S. Sri, Bakolitsa, Constantina, Wilson, Ian A., Godzik, Adam, and Wooley, John
- Subjects
- *
GENOMICS , *PROTEIN structure , *COMPUTER networks , *DATABASES , *STRUCTURAL bioinformatics - Abstract
Background: Many protein structures determined in high-throughput structural genomics centers, despite their significant novelty and importance, are available only as PDB depositions and are not accompanied by a peerreviewed manuscript. Because of this they are not accessible by the standard tools of literature searches, remaining underutilized by the broad biological community. Results: To address this issue we have developed TOPSAN, The Open Protein Structure Annotation Network, a web-based platform that combines the openness of the wiki model with the quality control of scientific communication. TOPSAN enables research collaborations and scientific dialogue among globally distributed participants, the results of which are reviewed by experts and eventually validated by peer review. The immediate goal of TOPSAN is to harness the combined experience, knowledge, and data from such collaborations in order to enhance the impact of the astonishing number and diversity of structures being determined by structural genomics centers and high-throughput structural biology. Conclusions: TOPSAN combines features of automated annotation databases and formal, peer-reviewed scientific research literature, providing an ideal vehicle to bridge a gap between rapidly accumulating data from highthroughput technologies and a much slower pace for its analysis and integration with other, relevant research. [ABSTRACT FROM AUTHOR]
- Published
- 2010
- Full Text
- View/download PDF
38. Predicting β-turns and their types usingpredicted backbone dihedral angles andsecondary structures.
- Author
-
Kountouris, Petros and Hirst, Jonathan D.
- Subjects
- *
PROTEIN folding , *SEQUENCE alignment , *PROTEIN conformation , *GENETIC techniques , *STRUCTURAL bioinformatics - Abstract
Background: β-turns are secondary structure elements usually classified as coil. Their prediction is important, because of their role in protein folding and their frequent occurrence in protein chains. Results: We have developed a novel method that predicts β-turns and their types using information from multiple sequence alignments, predicted secondary structures and, for the first time, predicted dihedral angles. Our method uses support vector machines, a supervised classification technique, and is trained and tested on three established datasets of 426, 547 and 823 protein chains. We achieve a Matthews correlation coefficient of up to 0.49, when predicting the location of β-turns, the highest reported value to date. Moreover, the additional dihedral information improves the prediction of b-turn types I, II, IV, VIII and "non-specific", achieving correlation coefficients up to 0.39, 0.33, 0.27, 0.14 and 0.38, respectively. Our results are more accurate than other methods. Conclusions: We have created an accurate predictor of β-turns and their types. Our method, called DEBT, is available online at http://comp.chem.nottingham.ac.uk/debt/. [ABSTRACT FROM AUTHOR]
- Published
- 2010
- Full Text
- View/download PDF
39. Automatic structure classification of smallproteins using random forest.
- Author
-
Jain, Pooja and Hirst, Jonathan D.
- Subjects
- *
MACHINE learning , *STRUCTURAL bioinformatics , *ALGORITHMS , *PROTEINS , *ARTIFICIAL intelligence - Abstract
Background: Random forest, an ensemble based supervised machine learning algorithm, is used to predict the SCOP structural classification for a target structure, based on the similarity of its structural descriptors to those of a template structure with an equal number of secondary structure elements (SSEs). An initial assessment of random forest is carried out for domains consisting of three SSEs. The usability of random forest in classifying larger domains is demonstrated by applying it to domains consisting of four, five and six SSEs. Results: Random forest, trained on SCOP version 1.69, achieves a predictive accuracy of up to 94% on an independent and non-overlapping test set derived from SCOP version 1.73. For classification to the SCOP Class, Fold, Super-family or Family levels, the predictive quality of the model in terms of Matthew's correlation coefficient (MCC) ranged from 0.61 to 0.83. As the number of constituent SSEs increases the MCC for classification to different structural levels decreases. Conclusions: The utility of random forest in classifying domains from the place-holder classes of SCOP to the true Class, Fold, Super-family or Family levels is demonstrated. Issues such as introduction of a new structural level in SCOP and the merger of singleton levels can also be addressed using random forest. A real-world scenario is mimicked by predicting the classification for those protein structures from the PDB, which are yet to be assigned to the SCOP classification hierarchy. [ABSTRACT FROM AUTHOR]
- Published
- 2010
- Full Text
- View/download PDF
40. A structural approach for finding functional modules from large biological networks.
- Author
-
Mete, Mutlu, Fusheng Tang, Xiaowei Xu, and Yuruk, Nurcan
- Subjects
- *
BIOLOGICAL systems , *ALGORITHMS , *STRUCTURAL bioinformatics , *INFORMATION networks , *PROTEIN-protein interactions , *DOCUMENT clustering - Abstract
Background: Biological systems can be modeled as complex network systems with many interactions between the components. These interactions give rise to the function and behavior of that system. For example, the protein-protein interaction network is the physical basis of multiple cellular functions. One goal of emerging systems biology is to analyze very large complex biological networks such as protein-protein interaction networks, metabolic networks, and regulatory networks to identify functional modules and assign functions to certain components of the system. Network modules do not occur by chance, so identification of modules is likely to capture the biologically meaningful interactions in large-scale PPI data. Unfortunately, existing computer-based clustering methods developed to find those modules are either not so accurate or too slow. Results: We devised a new methodology called SCAN (Structural Clustering Algorithm for Networks) that can efficiently find clusters or functional modules in complex biological networks as well as hubs and outliers. More specifically, we demonstrated that we can find functional modules in complex networks and classify nodes into various roles based on their structures. In this study, we showed the effectiveness of our methodology using the budding yeast (Saccharomyces cerevisiae) protein-protein interaction network. To validate our clustering results, we compared our clusters with the known functions of each protein. Our predicted functional modules achieved very high purity comparing with state-of-the-art approaches. Additionally the theoretical and empirical analysis demonstrated a linear running-time of the algorithm, which is the fastest approach for networks. Conclusion: We compare our algorithm with well-known modularity based clustering algorithm CNM. We successfully detect functional groups that are annotated with putative GO terms. Top- 10 clusters with minimum p-value theoretically prove that newly proposed algorithm partitions network more accurately then CNM. Furthermore, manual interpretations of functional groups found by SCAN show superior performance over CNM. [ABSTRACT FROM AUTHOR]
- Published
- 2008
- Full Text
- View/download PDF
41. Origin of structural difference in metabolic networks with respect to temperature.
- Author
-
Takemoto, Kazuhiro and Akutsu, Tatsuya
- Subjects
- *
METABOLISM , *TEMPERATURE , *STRUCTURAL bioinformatics , *APPROXIMATION theory , *HIGH temperatures , *BIOINFORMATICS - Abstract
Background: Metabolism is believed to adaptively shape-shift with changing environment. In recent years, a structural difference with respect to temperature, which is an environmental factor, has been revealed in metabolic networks, implying that metabolic networks transit with temperature. Subsequently, elucidatation of the origin of these structural differences due to temperature is important for understanding the evolution of life. However, the origin has yet to be clarified due to the complexity of metabolic networks. Results: Consequently, we propose a simple model with a few parameters to explain the transitions. We first present mathematical solutions of this model using mean-field approximation, and demonstrate that this model can reproduce structural properties, such as heterogeneous connectivity and hierarchical modularity, in real metabolic networks both qualitatively and quantitatively. We next show that the model parameters correlate with optimal growth temperature. In addition, we present a relationship between multiple cyclic properties and optimal growth temperature in metabolic networks. Conclusion: From the proposed model, we find that such structural properties are determined by the emergence of a short-cut path, which reduces the minimum distance between two nodes on a graph. Furthermore, we investigate correlations between model parameters and growth temperature; as a result, we find that the emergence of the short-cut path tends to be inhibited with increasing temperature. In addition, we also find that the short-cut path bypasses a relatively long path at high temperature when the emergence of the new path is not inhibited. Even further, additional network analysis provides convincing evidence of the reliability of the proposed model and its conclusions on the possible origins of differences in metabolic network structure. [ABSTRACT FROM AUTHOR]
- Published
- 2008
- Full Text
- View/download PDF
42. A multi-template combination algorithm for protein comparative modeling.
- Author
-
Jianlin Cheng
- Subjects
- *
PROTEINS , *BIOCHEMICAL templates , *ALGORITHMS , *CHEMICAL templates , *STRUCTURAL bioinformatics - Abstract
Background: Multiple protein templates are commonly used in manual protein structure prediction. However, few automated algorithms of selecting and combining multiple templates are available. Results: Here we develop an effective multi-template combination algorithm for protein comparative modeling. The algorithm selects templates according to the similarity significance of the alignments between template and target proteins. It combines the whole template-target alignments whose similarity significance score is close to that of the top template-target alignment within a threshold, whereas it only takes alignment fragments from a less similar template-target alignment that align with a sizable uncovered region of the target. We compare the algorithm with the traditional method of using a single top template on the 45 comparative modeling targets (i.e. easy template-based modeling targets) used in the seventh edition of Critical Assessment of Techniques for Protein Structure Prediction (CASP7). The multitemplate combination algorithm improves the GDT-TS scores of predicted models by 6.8% on average. The statistical analysis shows that the improvement is significant (p-value < 10-4). Compared with the ideal approach that always uses the best template, the multi-template approach yields only slightly better performance. During the CASP7 experiment, the preliminary implementation of the multi-template combination algorithm (FOLDpro) was ranked second among 67 servers in the category of high-accuracy structure prediction in terms of GDT-TS measure. Conclusion: We have developed a novel multi-template algorithm to improve protein comparative modeling. [ABSTRACT FROM AUTHOR]
- Published
- 2008
- Full Text
- View/download PDF
43. Structural deformation upon protein-protein interaction: A structural alphabet approach.
- Author
-
Martin, Juliette, Regad, Leslie, Lecornet, Hélène, and Camproux, Anne-Claude
- Subjects
- *
PROTEINS , *PROTEIN binding , *PROTEIN-protein interactions , *BIOLOGICAL interfaces , *STRUCTURAL bioinformatics - Abstract
Background: In a number of protein-protein complexes, the 3D structures of bound and unbound partners significantly differ, supporting the induced fit hypothesis for protein-protein binding. Results: In this study, we explore the induced fit modifications on a set of 124 proteins available in both bound and unbound forms, in terms of local structure. The local structure is described thanks to a structural alphabet of 27 structural letters that allows a detailed description of the backbone. Using a control set to distinguish induced fit from experimental error and natural protein flexibility, we show that the fraction of structural letters modified upon binding is significantly greater than in the control set (36% versus 28%). This proportion is even greater in the interface regions (41%). Interface regions preferentially involve coils. Our analysis further reveals that some structural letters in coil are not favored in the interface. We show that certain structural letters in coil are particularly subject to modifications at the interface, and that the severity of structural change also varies. These information are used to derive a structural letter substitution matrix that summarizes the local structural changes observed in our data set. We also illustrate the usefulness of our approach to identify common binding motifs in unrelated proteins. Conclusion: Our study provides qualitative information about induced fit. These results could be of help for flexible docking. [ABSTRACT FROM AUTHOR]
- Published
- 2008
- Full Text
- View/download PDF
44. Modularization of biochemical networks based on classification of Petri net t-invariants.
- Author
-
Grafahrend-Belau, Eva, Schreiber, Falk, Heiner, Monika, Sackmann, Andrea, Junker, Björn H., Grunwald, Stefanie, Speer, Astrid, Winder, Katja, and Koch, Ina
- Subjects
- *
STRUCTURAL analysis (Science) , *BIOCHEMISTRY , *CLASSIFICATION , *INVARIANTS (Mathematics) , *PETRI nets , *STRUCTURAL bioinformatics - Abstract
Background: Structural analysis of biochemical networks is a growing field in bioinformatics and systems biology. The availability of an increasing amount of biological data from molecular biological networks promises a deeper understanding but confronts researchers with the problem of combinatorial explosion. The amount of qualitative network data is growing much faster than the amount of quantitative data, such as enzyme kinetics. In many cases it is even impossible to measure quantitative data because of limitations of experimental methods, or for ethical reasons. Thus, a huge amount of qualitative data, such as interaction data, is available, but it was not sufficiently used for modeling purposes, until now. New approaches have been developed, but the complexity of data often limits the application of many of the methods. Biochemical Petri nets make it possible to explore static and dynamic qualitative system properties. One Petri net approach is model validation based on the computation of the system's invariant properties, focusing on t-invariants. T-invariants correspond to subnetworks, which describe the basic system behavior. With increasing system complexity, the basic behavior can only be expressed by a huge number of t-invariants. According to our validation criteria for biochemical Petri nets, the necessary verification of the biological meaning, by interpreting each subnetwork (t-invariant) manually, is not possible anymore. Thus, an automated, biologically meaningful classification would be helpful in analyzing t-invariants, and supporting the understanding of the basic behavior of the considered biological system. Methods: Here, we introduce a new approach to automatically classify t-invariants to cope with network complexity. We apply clustering techniques such as UPGMA, Complete Linkage, Single Linkage, and Neighbor Joining in combination with different distance measures to get biologically meaningful clusters (t-clusters), which can be interpreted as modules. To find the optimal number of t-clusters to consider for interpretation, the cluster validity measure, Silhouette Width, is applied. Results: We considered two different case studies as examples: a small signal transduction pathway (pheromone response pathway in Saccharomyces cerevisiae) and a medium-sized gene regulatory network (gene regulation of Duchenne muscular dystrophy). We automatically classified the t-invariants into functionally distinct t-clusters, which could be interpreted biologically as functional modules in the network. We found differences in the suitability of the various distance measures as well as the clustering methods. In terms of a biologically meaningful classification of t-invariants, the best results are obtained using the Tanimoto distance measure. Considering clustering methods, the obtained results suggest that UPGMA and Complete Linkage are suitable for clustering t-invariants with respect to the biological interpretability. Conclusion: We propose a new approach for the biological classification of Petri net t-invariants based on cluster analysis. Due to the biologically meaningful data reduction and structuring of network processes, large sets of t-invariants can be evaluated, allowing for model validation of qualitative biochemical Petri nets. This approach can also be applied to elementary mode analysis. [ABSTRACT FROM AUTHOR]
- Published
- 2008
- Full Text
- View/download PDF
45. TOPS++FATCAT: Fast flexible structural alignment using constraints derived from TOPS+ Strings Model.
- Author
-
Veeramalai, Mallika, Yuzhen Ye, and Godzik, Adam
- Subjects
- *
PROTEIN structure , *BIOINFORMATICS , *ALGORITHMS , *PROTEIN analysis , *STRING models (Physics) , *STRUCTURAL bioinformatics - Abstract
Background: Protein structure analysis and comparison are major challenges in structural bioinformatics. Despite the existence of many tools and algorithms, very few of them have managed to capture the intuitive understanding of protein structures developed in structural biology, especially in the context of rapid database searches. Such intuitions could help speed up similarity searches and make it easier to understand the results of such analyses. Results: We developed a TOPS++FATCAT algorithm that uses an intuitive description of the proteins' structures as captured in the popular TOPS diagrams to limit the search space of the aligned fragment pairs (AFPs) in the flexible alignment of protein structures performed by the FATCAT algorithm. The TOPS++FATCAT algorithm is faster than FATCAT by more than an order of magnitude with a minimal cost in classification and alignment accuracy. For beta-rich proteins its accuracy is better than FATCAT, because the TOPS+ strings models contains important information of the parallel and anti-parallel hydrogen-bond patterns between the beta-strand SSEs (Secondary Structural Elements). We show that the TOPS++FATCAT errors, rare as they are, can be clearly linked to oversimplifications of the TOPS diagrams and can be corrected by the development of more precise secondary structure element definitions. Software Availability: The benchmark analysis results and the compressed archive of the TOPS++FATCAT program for Linux platform can be downloaded from the following web site: http://fatcat.burnham.org/TOPS/ Conclusion: TOPS++FATCAT provides FATCAT accuracy and insights into protein structural changes at a speed comparable to sequence alignments, opening up a possibility of interactive protein structure similarity searches. [ABSTRACT FROM AUTHOR]
- Published
- 2008
- Full Text
- View/download PDF
46. Identification of DNA-binding protein target sequences by physicaleffective energy functions: free energy analysis of lambdarepressor-DNA complexes.
- Author
-
Moroni, Elisabetta, Caselle, Michele, and Fogolari, Federico
- Subjects
- *
DNA-binding proteins , *GENE expression , *GENETICS , *GENETIC regulation , *STRUCTURAL bioinformatics , *BIOCHEMISTRY - Abstract
Background: Specific binding of proteins to DNA is one of the most common ways gene expression is controlled. Although general rules for the DNA-protein recognition can be derived, the ambiguous and complex nature of this mechanism precludes a simple recognition code, therefore the prediction of DNA target sequences is not straightforward. DNA-protein interactions can be studied using computational methods which can complement the current experimental methods and offer some advantages. In the present work we use physical effective potentials to evaluate the DNA-protein binding affinities for the λ repressor-DNA complex for which structural and thermodynamic experimental data are available. Results: The binding free energy of two molecules can be expressed as the sum of an intermolecular energy (evaluated using a molecular mechanics forcefield), a solvation free energy term and an entropic term. Different solvation models are used including distance dependent dielectric constants, solvent accessible surface tension models and the Generalized Born model. The effect of conformational sampling by Molecular Dynamics simulations on the computed binding energy is assessed; results show that this effect is in general negative and the reproducibility of the experimental values decreases with the increase of simulation time considered. The free energy of binding for non-specific complexes, estimated using the best energetic model, agrees with earlier theoretical suggestions. As a results of these analyses, we propose a protocol for the prediction of DNA-binding target sequences. The possibility of searching regulatory elements within the bacteriophage λ genome using this protocol is explored. Our analysis shows good prediction capabilities, even in absence of any thermodynamic data and information on the naturally recognized sequence. Conclusion: This study supports the conclusion that physics-based methods can offer a completely complementary methodology to sequence-based methods for the identification of DNA-binding protein target sequences. [ABSTRACT FROM AUTHOR]
- Published
- 2007
- Full Text
- View/download PDF
47. Tools for integrated sequence-structure analysis with UCSF Chimera.
- Author
-
Meng, Elaine C, Pettersen, Eric F, Couch, Gregory S, Huang, Conrad C, and Ferrin, Thomas E
- Subjects
- *
MOSAICISM , *STRUCTURAL bioinformatics , *BIOINFORMATICS , *GENETICS , *COMPUTERS in biology - Abstract
Background: Comparing related structures and viewing the structures in the context of sequence alignments are important tasks in protein structure-function research. While many programs exist for individual aspects of such work, there is a need for interactive visualization tools that: (a) provide a deep integration of sequence and structure, far beyond mapping where a sequence region falls in the structure and vice versa; (b) facilitate changing data of one type based on the other (for example, using only sequence-conserved residues to match structures, or adjusting a sequence alignment based on spatial fit); (c) can be used with a researcher's own data, including arbitrary sequence alignments and annotations, closely or distantly related sets of proteins, etc.; and (d) interoperate with each other and with a full complement of molecular graphics features. We describe enhancements to UCSF Chimera to achieve these goals. Results: The molecular graphics program UCSF Chimera includes a suite of tools for interactive analyses of sequences and structures. Structures automatically associate with sequences in imported alignments, allowing many kinds of crosstalk. A novel method is provided to superimpose structures in the absence of a pre-existing sequence alignment. The method uses both sequence and secondary structure, and can match even structures with very low sequence identity. Another tool constructs structure-based sequence alignments from superpositions of two or more proteins. Chimera is designed to be extensible, and mechanisms for incorporating user-specific data without Chimera code development are also provided. Conclusion: The tools described here apply to many problems involving comparison and analysis of protein structures and their sequences. Chimera includes complete documentation and is intended for use by a wide range of scientists, not just those in the computational disciplines. UCSF Chimera is free for non-commercial use and is available for Microsoft Windows, Apple Mac OS X, Linux, and other platforms from http://www.cgl.ucsf.edu/chimera. [ABSTRACT FROM AUTHOR]
- Published
- 2006
- Full Text
- View/download PDF
48. Implications for domain fusion protein-protein interactions based on structural information.
- Author
-
Jer-Ming Chia and Kolatkar, Prasanna R.
- Subjects
- *
PROTEIN-protein interactions , *STRUCTURAL bioinformatics , *GENOMICS , *PROTEOMICS , *GENE fusion , *SACCHAROMYCES cerevisiae , *PROTEINS - Abstract
Background: Several in silico methods exist that were developed to predict protein interactions from the copious amount of genomic and proteomic data. One of these methods is Domain Fusion, which has proven to be effective in predicting functional links between proteins. Results: Analyzing the structures of multi-domain single-chain peptides, we found that domain pairs located less than 30 residues apart on a chain are almost certain to share a physical interface. The majority of these interactions are also conserved across separate chains. We make use of this observation to improve domain fusion based protein interaction predictions, and demonstrate this by implementing it on a set of Saccharomyces cerevisiae proteins. Conclusion: We show that existing structural data supports the domain fusion hypothesis. Empirical information from structural data also enables us to refine and assess domain fusion based protein interaction predictions. These interactions can then be integrated with downstream biochemical and genetic assays to generate more reliable protein interaction data sets. [ABSTRACT FROM AUTHOR]
- Published
- 2004
- Full Text
- View/download PDF
49. STING Millennium Suite: integrated software for extensive analyses of 3d structures of proteins and their complexes.
- Author
-
Higa, Roberto H., Togawa, Roberto C., Montagner, Arnaldo J., Palandrani, Juliana C. F., Okimoto, Igor K. S., Kuser, Paula R., Yamagishi, Michel E. B., Mancini, Adauto L., and Neshich, Goran
- Subjects
- *
INTEGRATED software , *PROTEIN analysis , *STRUCTURAL bioinformatics , *MOLECULAR structure , *DNA - Abstract
Background: The integration of many aspects of protein/DNA structure analysis is an important requirement for software products in general area of structural bioinformatics. In fact, there are too few software packages on the internet which can be described as successful in this respect. We might say that what is still missing is publicly available, web based software for interactive analysis of the sequence/ structure/function of proteins and their complexes with DNA and ligands. Some of existing software packages do have certain level of integration and do offer analysis of several structure related parameters, however not to the extent generally demanded by a user. Results: We are reporting here about new Sting Millennium Suite (SMS) version which is fully accessible (including for local files at client end), web based software for molecular structure and sequence/structure/ function analysis. The new SMS client version is now operational also on Linux boxes and it works with non-public pdb formatted files (structures not deposited at the RCSB/PDB), eliminating earlier requirement for the registration if SMS components were to be used with user's local files. At the same time the new SMS offers some important additions and improvements such as link to ProTherm as well as significant re-engineering of SMS component ConSSeq. Also, we have added 3 new SMS mirror sites to existing network of global SMS servers: Argentina, Japan and Spain. Conclusion: SMS is already established software package and many key data base and software servers worldwide, do offer either a link to, or host the SMS. SMS (Sting Millennium Suite) is web-based publicly available software developed to aid researches in their quest for translating information about the structures of macromolecules into knowledge. SMS allows to a user to interactively analyze molecular structures, cross-referencing visualized information with a correlated one, available across the internet. SMS is already used as a didactic tool by some universities. SMS analysis is now possible on Linux OS boxes and with no requirement for registration when using local files. [ABSTRACT FROM AUTHOR]
- Published
- 2004
- Full Text
- View/download PDF
50. Structural characterization of genomes by large scale sequence-structure threading: application of reliability analysis in structural genomics.
- Author
-
Cherkasov, Artem, Ho Sui, Shannan J., Brunham, Robert C., and Jones, Steven J. M.
- Subjects
- *
MOLECULAR structure , *GENOMES , *NUCLEOTIDE sequence , *STRUCTURAL bioinformatics , *WEIBULL distribution , *GENOMICS - Abstract
Background: We establish that the occurrence of protein folds among genomes can be accurately described with a Weibull function. Systems which exhibit Weibull character can be interpreted with reliability theory commonly used in engineering analysis. For instance, Weibull distributions are widely used in reliability, maintainability and safety work to model time-to-failure of mechanical devices, mechanisms, building constructions and equipment. Results: We have found that the Weibull function describes protein fold distribution within and among genomes more accurately than conventional power functions which have been used in a number of structural genomic studies reported to date. It has also been found that the Weibull reliability parameter β for protein fold distributions varies between genomes and may reflect differences in rates of gene duplication in evolutionary history of organisms. Conclusions: The results of this work demonstrate that reliability analysis can provide useful insights and testable predictions in the fields of comparative and structural genomics. [ABSTRACT FROM AUTHOR]
- Published
- 2004
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.