Author: "Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)" / Journal: bmc bioinformatics - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)"' showing total 12 results

Start Over Author "Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)" Journal bmc bioinformatics

12 results on '"Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)"'

1. Computational discovery of direct associations between GO terms and protein domains

Author: Seyed Ziaeddin Alborzi, David W. Ritchie, Marie-Dominique Devignes, Computational Algorithms for Protein Structures and Interactions (CAPSID), Inria Nancy - Grand Est, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Department of Complex Systems, Artificial Intelligence & Robotics (LORIA - AIS), Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS), Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), and Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)
Subjects: Protein domain, Research, Protein function, Computational Biology, Proteins, Molecular Sequence Annotation, Vector similarity, lcsh:Computer applications to medicine. Medical informatics, Gene Ontology, lcsh:Biology (General), Protein Domains, Area Under Curve, Protein structure, lcsh:R858-859.7, Amino Acid Sequence, [INFO.INFO-BI]Computer Science [cs]/Bioinformatics [q-bio.QM], Databases, Protein, lcsh:QH301-705.5, Algorithms
Abstract: Background Families of related proteins and their different functions may be described systematically using common classifications and ontologies such as Pfam and GO (Gene Ontology), for example. However, many proteins consist of multiple domains, and each domain, or some combination of domains, can be responsible for a particular molecular function. Therefore, identifying which domains should be associated with a specific function is a non-trivial task. Results We describe a general approach for the computational discovery of associations between different sets of annotations by formalising the problem as a bipartite graph enrichment problem in the setting of a tripartite graph. We call this approach “CODAC” (for COmputational Discovery of Direct Associations using Common Neighbours). As one application of this approach, we describe “GODomainMiner” for associating GO terms with protein domains. We used GODomainMiner to predict GO-domain associations between each of the 3 GO ontology namespaces (MF, BP, and CC) and the Pfam, CATH, and SCOP domain classifications. Overall, GODomainMiner yields average enrichments of 15-, 41- and 25-fold GO-domain associations compared to the existing GO annotations in these 3 domain classifications, respectively. Conclusions These associations could potentially be used to annotate many of the protein chains in the Protein Databank and protein sequences in UniProt whose domain composition is known but which currently lack GO annotation. Electronic supplementary material The online version of this article (10.1186/s12859-018-2380-2) contains supplementary material, which is available to authorized users.
Published: 2018

2. GPCRs from fusarium graminearum detection, modeling and virtual screening - the search for new routes to control head blight disease

Author: Roberto C. Togawa, Natália F. Martins, Emmanuel Bresso, Bernard Maigret, Kim E. Hammond-Kosack, Martin Urban, Computational Algorithms for Protein Structures and Interactions (CAPSID), Inria Nancy - Grand Est, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Department of Complex Systems, Artificial Intelligence & Robotics (LORIA - AIS), Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL), Embrapa Uva e Vinho (BRAZIL), Rothamsted Research, Universidade Federal do Ceará = Federal University of Ceará (UFC), EMBRAPA, CAPES, CNPq (Brazil), BBSRC (UK), EMBRAPA (Br), LORIA (Fr), Rothamsted Research (UK), Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS), and Biotechnology and Biological Sciences Research Council (BBSRC)
Subjects: 0301 basic medicine, 030103 biophysics, In silico, Genomics, Computational biology, Biology, Molecular Dynamics Simulation, Biochemistry, Chemical library, Receptors, G-Protein-Coupled, Fungal Proteins, 03 medical and health sciences, Structural bioinformatics, chemistry.chemical_compound, Fusarium, Structural Biology, Homology modeling, Molecular Biology, Plant Diseases, 2. Zero hunger, Fungal protein, Virtual screening, business.industry, Applied Mathematics, Research, food and beverages, Computer Science Applications, Biotechnology, Fusarium graminearum, 030104 developmental biology, G-protein coupled receptors, Fusarium head blight, chemistry, 13. Climate action, Docking (molecular), [INFO.INFO-BI]Computer Science [cs]/Bioinformatics [q-bio.QM], business, Signal Transduction
Abstract: Backgound Fusarium graminearum (FG) is one of the major cereal infecting pathogens causing high economic losses worldwide and resulting in adverse effects on human and animal health. Therefore, the development of new fungicides against FG is an important issue to reduce cereal infection and economic impact. In the strategy for developing new fungicides, a critical step is the identification of new targets against which innovative chemicals weapons can be designed. As several G-protein coupled receptors (GPCRs) are implicated in signaling pathways critical for the fungi development and survival, such proteins could be valuable efficient targets to reduce Fusarium growth and therefore to prevent food contamination. Results In this study, GPCRs were predicted in the FG proteome using a manually curated pipeline dedicated to the identification of GPCRs. Based on several successive filters, the most appropriate GPCR candidate target for developing new fungicides was selected. Searching for new compounds blocking this particular target requires the knowledge of its 3D-structure. As no experimental X-Ray structure of the selected protein was available, a 3D model was built by homology modeling. The model quality and stability was checked by 100 ns of molecular dynamics simulations. Two stable conformations representative of the conformational families of the protein were extracted from the 100 ns simulation and were used for an ensemble docking campaign. The model quality and stability was checked by 100 ns of molecular dynamics simulations previously to the virtual screening step. The virtual screening step comprised the exploration of a chemical library with 11,000 compounds that were docked to the GPCR model. Among these compounds, we selected the ten top-ranked nontoxic molecules proposed to be experimentally tested to validate the in silico simulation. Conclusions This study provides an integrated process merging genomics, structural bioinformatics and drug design for proposing innovative solutions to a world wide threat to grain producers and consumers. Electronic supplementary material The online version of this article (doi:10.1186/s12859-016-1342-9) contains supplementary material, which is available to authorized users.
Published: 2016

3. Integrative relational machine-learning for understanding drug side-effect profiles

Author: Michel Souchet, Gino Marchetti, Malika Smaïl-Tabbone, Emmanuel Bresso, Marie-Dominique Devignes, Arnaud Sinan Karaboga, Renaud Grisoni, Knowledge representation, reasonning (ORPAILLEUR), Inria Nancy - Grand Est, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Department of Natural Language Processing & Knowledge Discovery (LORIA - NLPKD), Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS), Harmonic Phama, Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), and Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)
Subjects: Drug, Drug-Related Side Effects and Adverse Reactions, Side effect, Databases, Pharmaceutical, Computer science, media_common.quotation_subject, Statistical relational learning, Decision tree, Machine learning, computer.software_genre, 01 natural sciences, Biochemistry, 03 medical and health sciences, Side effect (computer science), Semantic similarity, Artificial Intelligence, Structural Biology, [SDV.BBM.GTP]Life Sciences [q-bio]/Biochemistry, Molecular Biology/Genomics [q-bio.GN], Molecule, Data mining, Molecular Biology, 030304 developmental biology, media_common, 0303 health sciences, Drug discovery, business.industry, Applied Mathematics, Decision Trees, Computational Biology, Reproducibility of Results, Relational machine learning, [SDV.BIBS]Life Sciences [q-bio]/Quantitative Methods [q-bio.QM], 0104 chemical sciences, Computer Science Applications, Clinical trial, 010404 medicinal & biomolecular chemistry, Identification (information), Drug development, Drug side-effects, Table (database), Data integration, Artificial intelligence, [INFO.INFO-BI]Computer Science [cs]/Bioinformatics [q-bio.QM], DNA microarray, business, computer, DrugBank, Research Article
Abstract: International audience; BackgroundDrug side effects represent a common reason for stopping drug development during clinical trials. Improving our ability to understand drug side effects is necessary to reduce attrition rates during drug development as well as the risk of discovering novel side effects in available drugs. Today, most investigations deal with isolated side effects and overlook possible redundancy and their frequent co-occurrence.ResultsIn this work, drug annotations are collected from SIDER and DrugBank databases. Terms describing individual side effects reported in SIDER are clustered with a semantic similarity measure into term clusters (TCs). Maximal frequent itemsets are extracted from the resulting drug x TC binary table, leading to the identification of what we call side-effect profiles (SEPs). A SEP is defined as the longest combination of TCs which are shared by a significant number of drugs. Frequent SEPs are explored on the basis of integrated drug and target descriptors using two machine learning methods: decision-trees and inductive-logic programming. Although both methods yield explicit models, inductive-logic programming method performs relational learning and is able to exploit not only drug properties but also background knowledge. Learning efficiency is evaluated by cross-validation and direct testing with new molecules. Comparison of the two machine-learning methods shows that the inductive-logic-programming method displays a greater sensitivity than decision trees and successfully exploit background knowledge such as functional annotations and pathways of drug targets, thereby producing rich and expressive rules. All models and theories are available on a dedicated web site.ConclusionsSide effect profiles covering significant number of drugs have been extracted from a drug ×side-effect association table. Integration of background knowledge concerning both chemical and biological spaces has been combined with a relational learning method for discovering rules which explicitly characterize drug-SEP associations. These rules are successfully used for predicting SEPs associated with new drugs.
Published: 2013

4. Simple Shared Motifs (SSM) in conserved region of promoters: a new approach to identify co-regulation patterns

Author: Jérémy Gruel, Nolwenn LeMeur, Michel LeBorgne, Nathalie Théret, Signalisation et Réponses aux Agents Infectieux et Chimiques (SeRAIC), Université de Rennes 1 (UR1), Université de Rennes (UNIV-RENNES)-Université de Rennes (UNIV-RENNES), Biological systems and models, bioinformatics and sequences (SYMBIOSE), Institut de Recherche en Informatique et Systèmes Aléatoires (IRISA), Université de Rennes (UNIV-RENNES)-Université de Rennes (UNIV-RENNES)-Institut National des Sciences Appliquées - Rennes (INSA Rennes), Institut National des Sciences Appliquées (INSA)-Université de Rennes (UNIV-RENNES)-Institut National des Sciences Appliquées (INSA)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Université de Rennes 1 (UR1), Institut National des Sciences Appliquées (INSA)-Université de Rennes (UNIV-RENNES)-Institut National des Sciences Appliquées (INSA)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Inria Rennes – Bretagne Atlantique, Institut National de Recherche en Informatique et en Automatique (Inria), Institut National de la Santé et de la Recherche Médicale, Association pour la Recherche contre le Cancer, Ligue Contre le Cancer, Région Bretagne (PRIR n° 3193), Université de Rennes (UR), Université de Rennes (UR)-Institut National des Sciences Appliquées - Rennes (INSA Rennes), Institut National des Sciences Appliquées (INSA)-Institut National des Sciences Appliquées (INSA)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Université de Rennes (UR)-Institut National des Sciences Appliquées - Rennes (INSA Rennes), and Institut National des Sciences Appliquées (INSA)-Institut National des Sciences Appliquées (INSA)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Inria Rennes – Bretagne Atlantique
Subjects: Genomics, MESH: Algorithms, Computational biology, Biology, Phylogenetic footprinting, MESH: Base Sequence, lcsh:Computer applications to medicine. Medical informatics, Biochemistry, Conserved sequence, Conserved non-coding sequence, 03 medical and health sciences, MESH: Software, 0302 clinical medicine, Structural Biology, MESH: Promoter Regions, Genetic, Animals, Humans, MESH: Animals, MESH: Phylogeny, Promoter Regions, Genetic, lcsh:QH301-705.5, Molecular Biology, Conserved Sequence, Phylogeny, 030304 developmental biology, Regulation of gene expression, Genetics, 0303 health sciences, MESH: Conserved Sequence, MESH: Humans, Base Sequence, Applied Mathematics, MESH: Genomics, Promoter, MESH: Gene Expression Regulation, [SDV.BIBS]Life Sciences [q-bio]/Quantitative Methods [q-bio.QM], Computer Science Applications, DNA binding site, lcsh:Biology (General), Gene Expression Regulation, lcsh:R858-859.7, DNA microarray, [INFO.INFO-BI]Computer Science [cs]/Bioinformatics [q-bio.QM], 030217 neurology & neurosurgery, Algorithms, Software, Research Article
Abstract: Background Regulation of gene expression plays a pivotal role in cellular functions. However, understanding the dynamics of transcription remains a challenging task. A host of computational approaches have been developed to identify regulatory motifs, mainly based on the recognition of DNA sequences for transcription factor binding sites. Recent integration of additional data from genomic analyses or phylogenetic footprinting has significantly improved these methods. Results Here, we propose a different approach based on the compilation of Simple Shared Motifs (SSM), groups of sequences defined by their length and similarity and present in conserved sequences of gene promoters. We developed an original algorithm to search and count SSM in pairs of genes. An exceptional number of SSM is considered as a common regulatory pattern. The SSM approach is applied to a sample set of genes and validated using functional gene-set enrichment analyses. We demonstrate that the SSM approach selects genes that are over-represented in specific biological categories (Ontology and Pathways) and are enriched in co-expressed genes. Finally we show that genes co-expressed in the same tissue or involved in the same biological pathway have increased SSM values. Conclusions Using unbiased clustering of genes, Simple Shared Motifs analysis constitutes an original contribution to provide a clearer definition of expression networks.
Published: 2010

5. Inferring the role of transcription factors in regulatory networks

Author: P. Veber, Carito Guziolowski, Michel Le Borgne, Ovidiu Radulescu, Anne Siegel, Biological systems and models, bioinformatics and sequences (SYMBIOSE), Institut de Recherche en Informatique et Systèmes Aléatoires (IRISA), Université de Rennes (UR)-Institut National des Sciences Appliquées - Rennes (INSA Rennes), Institut National des Sciences Appliquées (INSA)-Institut National des Sciences Appliquées (INSA)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Université de Rennes (UR)-Institut National des Sciences Appliquées - Rennes (INSA Rennes), Institut National des Sciences Appliquées (INSA)-Institut National des Sciences Appliquées (INSA)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Inria Rennes – Bretagne Atlantique, Institut National de Recherche en Informatique et en Automatique (Inria), Institut de Recherche Mathématique de Rennes (IRMAR), Institut National des Sciences Appliquées (INSA)-Institut National des Sciences Appliquées (INSA)-École normale supérieure - Rennes (ENS Rennes)-Université de Rennes 2 (UR2)-Centre National de la Recherche Scientifique (CNRS)-INSTITUT AGRO Agrocampus Ouest, Institut national d'enseignement supérieur pour l'agriculture, l'alimentation et l'environnement (Institut Agro)-Institut national d'enseignement supérieur pour l'agriculture, l'alimentation et l'environnement (Institut Agro), Biological systems and models, bioinformatics and sequences ( SYMBIOSE ), Institut de Recherche en Informatique et Systèmes Aléatoires ( IRISA ), Université de Rennes 1 ( UR1 ), Université de Rennes ( UNIV-RENNES ) -Université de Rennes ( UNIV-RENNES ) -Institut National des Sciences Appliquées - Rennes ( INSA Rennes ) -Institut National de Recherche en Informatique et en Automatique ( Inria ) -Centre National de la Recherche Scientifique ( CNRS ) -Université de Rennes 1 ( UR1 ), Université de Rennes ( UNIV-RENNES ) -Université de Rennes ( UNIV-RENNES ) -Institut National des Sciences Appliquées - Rennes ( INSA Rennes ) -Institut National de Recherche en Informatique et en Automatique ( Inria ) -Centre National de la Recherche Scientifique ( CNRS ) -Inria Rennes – Bretagne Atlantique, Institut National de Recherche en Informatique et en Automatique ( Inria ), Institut de Recherche Mathématique de Rennes ( IRMAR ), Université de Rennes ( UNIV-RENNES ) -Université de Rennes ( UNIV-RENNES ) -AGROCAMPUS OUEST-École normale supérieure - Rennes ( ENS Rennes ) -Institut National de Recherche en Informatique et en Automatique ( Inria ) -Institut National des Sciences Appliquées ( INSA ) -Université de Rennes 2 ( UR2 ), Université de Rennes ( UNIV-RENNES ) -Centre National de la Recherche Scientifique ( CNRS ), Université de Rennes 1 (UR1), Université de Rennes (UNIV-RENNES)-Université de Rennes (UNIV-RENNES)-Institut National des Sciences Appliquées - Rennes (INSA Rennes), Institut National des Sciences Appliquées (INSA)-Université de Rennes (UNIV-RENNES)-Institut National des Sciences Appliquées (INSA)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Université de Rennes 1 (UR1), Institut National des Sciences Appliquées (INSA)-Université de Rennes (UNIV-RENNES)-Institut National des Sciences Appliquées (INSA)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Inria Rennes – Bretagne Atlantique, AGROCAMPUS OUEST, Institut national d'enseignement supérieur pour l'agriculture, l'alimentation et l'environnement (Institut Agro)-Institut national d'enseignement supérieur pour l'agriculture, l'alimentation et l'environnement (Institut Agro)-Université de Rennes 1 (UR1), Université de Rennes (UNIV-RENNES)-Université de Rennes (UNIV-RENNES)-Université de Rennes 2 (UR2), Université de Rennes (UNIV-RENNES)-École normale supérieure - Rennes (ENS Rennes)-Centre National de la Recherche Scientifique (CNRS)-Institut National des Sciences Appliquées - Rennes (INSA Rennes), and Institut National des Sciences Appliquées (INSA)-Université de Rennes (UNIV-RENNES)-Institut National des Sciences Appliquées (INSA)
Subjects: Transcriptional Activation, Inference, Biology, lcsh:Computer applications to medicine. Medical informatics, computer.software_genre, Models, Biological, Biochemistry, 03 medical and health sciences, 0302 clinical medicine, Structural Biology, [ INFO.INFO-BI ] Computer Science [cs]/Bioinformatics [q-bio.QM], Computer Simulation, Observability, [ SDV.BIBS ] Life Sciences [q-bio]/Quantitative Methods [q-bio.QM], lcsh:QH301-705.5, Molecular Biology, ComputingMilieux_MISCELLANEOUS, 030304 developmental biology, Regulation of gene expression, 0303 health sciences, Methodology Article, Gene Expression Profiling, Applied Mathematics, Small number, [SDV.BIBS]Life Sciences [q-bio]/Quantitative Methods [q-bio.QM], Reconstruction method, Computer Science Applications, Gene expression profiling, lcsh:Biology (General), Gene Expression Regulation, Expression data, lcsh:R858-859.7, Data mining, [INFO.INFO-BI]Computer Science [cs]/Bioinformatics [q-bio.QM], DNA microarray, computer, Algorithms, 030217 neurology & neurosurgery, Signal Transduction, Transcription Factors
Abstract: Background Expression profiles obtained from multiple perturbation experiments are increasingly used to reconstruct transcriptional regulatory networks, from well studied, simple organisms up to higher eukaryotes. Admittedly, a key ingredient in developing a reconstruction method is its ability to integrate heterogeneous sources of information, as well as to comply with practical observability issues: measurements can be scarce or noisy. In this work, we show how to combine a network of genetic regulations with a set of expression profiles, in order to infer the functional effect of the regulations, as inducer or repressor. Our approach is based on a consistency rule between a network and the signs of variation given by expression arrays. Results We evaluate our approach in several settings of increasing complexity. First, we generate artificial expression data on a transcriptional network of E. coli extracted from the literature (1529 nodes and 3802 edges), and we estimate that 30% of the regulations can be annotated with about 30 profiles. We additionally prove that at most 40.8% of the network can be inferred using our approach. Second, we use this network in order to validate the predictions obtained with a compendium of real expression profiles. We describe a filtering algorithm that generates particularly reliable predictions. Finally, we apply our inference approach to S. cerevisiae transcriptional network (2419 nodes and 4344 interactions), by combining ChIP-chip data and 15 expression profiles. We are able to detect and isolate inconsistencies between the expression profiles and a significant portion of the model (15% of all the interactions). In addition, we report predictions for 14.5% of all interactions. Conclusion Our approach does not require accurate expression levels nor times series. Nevertheless, we show on both data, real and artificial, that a relatively small number of perturbation experiments are enough to determine a significant portion of regulatory effects. This is a key practical asset compared to statistical methods for network reconstruction. We demonstrate that our approach is able to provide accurate predictions, even when the network is incomplete and the data is noisy.
Published: 2008

6. Browsing repeats in genomes: Pygram and an application to non-coding region analysis

Author: Anne-Sophie Valin, Jacques Nicolas, Patrick Durand, Frédéric Mahé, Biological systems and models, bioinformatics and sequences (SYMBIOSE), Institut de Recherche en Informatique et Systèmes Aléatoires (IRISA), Université de Rennes 1 (UR1), Université de Rennes (UNIV-RENNES)-Université de Rennes (UNIV-RENNES)-Institut National des Sciences Appliquées - Rennes (INSA Rennes), Institut National des Sciences Appliquées (INSA)-Université de Rennes (UNIV-RENNES)-Institut National des Sciences Appliquées (INSA)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Université de Rennes 1 (UR1), Institut National des Sciences Appliquées (INSA)-Université de Rennes (UNIV-RENNES)-Institut National des Sciences Appliquées (INSA)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Inria Rennes – Bretagne Atlantique, Institut National de Recherche en Informatique et en Automatique (Inria), Ecosystèmes, biodiversité, évolution [Rennes] (ECOBIO), Université de Rennes (UNIV-RENNES)-Université de Rennes (UNIV-RENNES)-Institut Ecologie et Environnement (INEE), Centre National de la Recherche Scientifique (CNRS)-Centre National de la Recherche Scientifique (CNRS)-Observatoire des Sciences de l'Univers de Rennes (OSUR)-Centre National de la Recherche Scientifique (CNRS), Université de Rennes (UR)-Institut National des Sciences Appliquées - Rennes (INSA Rennes), Institut National des Sciences Appliquées (INSA)-Institut National des Sciences Appliquées (INSA)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Université de Rennes (UR)-Institut National des Sciences Appliquées - Rennes (INSA Rennes), Institut National des Sciences Appliquées (INSA)-Institut National des Sciences Appliquées (INSA)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Inria Rennes – Bretagne Atlantique, Université de Rennes (UR)-Institut Ecologie et Environnement (INEE), Centre National de la Recherche Scientifique (CNRS)-Centre National de la Recherche Scientifique (CNRS)-Observatoire des Sciences de l'Univers de Rennes (OSUR), Université de Rennes (UR)-Institut national des sciences de l'Univers (INSU - CNRS)-Université de Rennes 2 (UR2)-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche pour l’Agriculture, l’Alimentation et l’Environnement (INRAE)-Institut national des sciences de l'Univers (INSU - CNRS)-Université de Rennes 2 (UR2)-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche pour l’Agriculture, l’Alimentation et l’Environnement (INRAE)-Centre National de la Recherche Scientifique (CNRS), Centre National de la Recherche Scientifique (CNRS)-Observatoire des Sciences de l'Univers de Rennes (OSUR)-Institut Ecologie et Environnement (INEE), Centre National de la Recherche Scientifique (CNRS)-Centre National de la Recherche Scientifique (CNRS)-Université de Rennes 1 (UR1), and Université de Rennes (UNIV-RENNES)-Université de Rennes (UNIV-RENNES)
Subjects: Suffix tree, Genome, Viral, Computational biology, Biology, lcsh:Computer applications to medicine. Medical informatics, Biochemistry, Genome, law.invention, 03 medical and health sciences, Genome, Archaeal, Structural Biology, law, [SDV.BBM.GTP]Life Sciences [q-bio]/Biochemistry, Molecular Biology/Genomics [q-bio.GN], CRISPR, Coding region, Codon, Molecular Biology, lcsh:QH301-705.5, Repetitive Sequences, Nucleic Acid, 030304 developmental biology, Genetics, 0303 health sciences, Indexed file, 030306 microbiology, Methodology Article, Applied Mathematics, Computer Science Applications, Visualization, lcsh:Biology (General), Horizontal gene transfer, lcsh:R858-859.7, DNA microarray, Software
Abstract: Background A large number of studies on genome sequences have revealed the major role played by repeated sequences in the structure, function, dynamics and evolution of genomes. In-depth repeat analysis requires specialized methods, including visualization techniques, to achieve optimum exploratory power. Results This article presents Pygram, a new visualization application for investigating the organization of repeated sequences in complete genome sequences. The application projects data from a repeat index file on the analysed sequences, and by combining this principle with a query system, is capable of locating repeated sequences with specific properties. In short, Pygram provides an efficient, graphical browser for studying repeats. Implementation of the complete configuration is illustrated in an analysis of CRISPR structures in Archaea genomes and the detection of horizontal transfer between Archaea and Viruses. Conclusion By proposing a new visualization environment to analyse repeated sequences, this application aims to increase the efficiency of laboratories involved in investigating repeat organization in single genomes or across several genomes.
Published: 2006

7. JUICE: a data management system that facilitates the analysis of large volumes of information in an EST project workflow

Author: Veronica Martinez, Lee A. Meisel, Andrea Morales, Ariel Orellana, Rodrigo Caroca, Reinaldo Campos-Vargas, Juan Saba, Carito Guziolowski, Jonathan Maldonado, Verónica Cambiazo, Herman Silva, Mariano Latorre, Mauricio González, Julio Retamales, Paula Vizoso, Millennium Nucleus in Plant Cell Biology and Plant Biotechnology Center, Universidad Andrés Bello [Santiago] (UNAB), Laboratorio de Genética Molecular Vegetal, Departamento de Biología, Biological systems and models, bioinformatics and sequences (SYMBIOSE), Institut de Recherche en Informatique et Systèmes Aléatoires (IRISA), Université de Rennes 1 (UR1), Université de Rennes (UNIV-RENNES)-Université de Rennes (UNIV-RENNES)-Institut National des Sciences Appliquées - Rennes (INSA Rennes), Institut National des Sciences Appliquées (INSA)-Université de Rennes (UNIV-RENNES)-Institut National des Sciences Appliquées (INSA)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Université de Rennes 1 (UR1), Institut National des Sciences Appliquées (INSA)-Université de Rennes (UNIV-RENNES)-Institut National des Sciences Appliquées (INSA)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Inria Rennes – Bretagne Atlantique, Institut National de Recherche en Informatique et en Automatique (Inria), Laboratorio de Bioinformática y Expresión Génica, INIA La Platina, Ministerio de Agricultura, Université de Rennes (UR)-Institut National des Sciences Appliquées - Rennes (INSA Rennes), Institut National des Sciences Appliquées (INSA)-Institut National des Sciences Appliquées (INSA)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Université de Rennes (UR)-Institut National des Sciences Appliquées - Rennes (INSA Rennes), and Institut National des Sciences Appliquées (INSA)-Institut National des Sciences Appliquées (INSA)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Inria Rennes – Bretagne Atlantique
Subjects: 0106 biological sciences, Data management, Biology, lcsh:Computer applications to medicine. Medical informatics, computer.software_genre, 01 natural sciences, Biochemistry, Data type, Clipboard, Management Information Systems, World Wide Web, Databases, 03 medical and health sciences, Upload, Structural Biology, Databases, Genetic, lcsh:QH301-705.5, Molecular Biology, 030304 developmental biology, Expressed Sequence Tags, 0303 health sciences, Chromatography, Genome, Nucleic Acid, Database, business.industry, Applied Mathematics, food and beverages, Computational Biology, Genome project, Genomics, Sequence Analysis, DNA, Pipeline (software), [SDV.BIBS]Life Sciences [q-bio]/Quantitative Methods [q-bio.QM], Computer Science Applications, Management information systems, Workflow, lcsh:Biology (General), lcsh:R858-859.7, Database Management Systems, Programming Languages, [INFO.INFO-BI]Computer Science [cs]/Bioinformatics [q-bio.QM], business, Databases, Nucleic Acid, Sequence Analysis, computer, Software, 010606 plant biology & botany
Abstract: Background Expressed sequence tag (EST) analyses provide a rapid and economical means to identify candidate genes that may be involved in a particular biological process. These ESTs are useful in many Functional Genomics studies. However, the large quantity and complexity of the data generated during an EST sequencing project can make the analysis of this information a daunting task. Results In an attempt to make this task friendlier, we have developed JUICE, an open source data management system (Apache + PHP + MySQL on Linux), which enables the user to easily upload, organize, visualize and search the different types of data generated in an EST project pipeline. In contrast to other systems, the JUICE data management system allows a branched pipeline to be established, modified and expanded, during the course of an EST project. The web interfaces and tools in JUICE enable the users to visualize the information in a graphical, user-friendly manner. The user may browse or search for sequences and/or sequence information within all the branches of the pipeline. The user can search using terms associated with the sequence name, annotation or other characteristics stored in JUICE and associated with sequences or sequence groups. Groups of sequences can be created by the user, stored in a clipboard and/or downloaded for further analyses. Different user profiles restrict the access of each user depending upon their role in the project. The user may have access exclusively to visualize sequence information, access to annotate sequences and sequence information, or administrative access. Conclusion JUICE is an open source data management system that has been developed to aid users in organizing and analyzing the large amount of data generated in an EST Project workflow. JUICE has been used in one of the first functional genomics projects in Chile, entitled "Functional Genomics in nectarines: Platform to potentiate the competitiveness of Chile in fruit exportation". However, due to its ability to organize and visualize data from external pipelines, JUICE is a flexible data management system that should be useful for other EST/Genome projects. The JUICE data management system is released under the Open Source GNU Lesser General Public License (LGPL). JUICE may be downloaded from http://genoma.unab.cl/juice_system/ or http://www.genomavegetal.cl/juice_system/.
Published: 2006

8. PLAST: parallel local alignment search tool for database comparison

Author: Van Hoa Nguyen, Dominique Lavenier, Biological systems and models, bioinformatics and sequences (SYMBIOSE), Institut de Recherche en Informatique et Systèmes Aléatoires (IRISA), Université de Rennes (UR)-Institut National des Sciences Appliquées - Rennes (INSA Rennes), Institut National des Sciences Appliquées (INSA)-Institut National des Sciences Appliquées (INSA)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Université de Rennes (UR)-Institut National des Sciences Appliquées - Rennes (INSA Rennes), Institut National des Sciences Appliquées (INSA)-Institut National des Sciences Appliquées (INSA)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Inria Rennes – Bretagne Atlantique, Institut National de Recherche en Informatique et en Automatique (Inria), Université de Rennes 1 (UR1), Université de Rennes (UNIV-RENNES)-Université de Rennes (UNIV-RENNES)-Institut National des Sciences Appliquées - Rennes (INSA Rennes), Institut National des Sciences Appliquées (INSA)-Université de Rennes (UNIV-RENNES)-Institut National des Sciences Appliquées (INSA)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Université de Rennes 1 (UR1), and Institut National des Sciences Appliquées (INSA)-Université de Rennes (UNIV-RENNES)-Institut National des Sciences Appliquées (INSA)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Inria Rennes – Bretagne Atlantique
Subjects: Speedup, Theoretical computer science, Databases, Factual, Computer science, Parallel algorithm, Information Storage and Retrieval, Parallel computing, lcsh:Computer applications to medicine. Medical informatics, Biochemistry, Instruction set, 03 medical and health sciences, 0302 clinical medicine, Microcomputers, Structural Biology, SIMD, lcsh:QH301-705.5, Molecular Biology, 030304 developmental biology, Smith–Waterman algorithm, 0303 health sciences, Multi-core processor, Applied Mathematics, Computational Biology, [SDV.BIBS]Life Sciences [q-bio]/Quantitative Methods [q-bio.QM], Computer Science Applications, lcsh:Biology (General), Multithreading, Programming paradigm, lcsh:R858-859.7, [INFO.INFO-BI]Computer Science [cs]/Bioinformatics [q-bio.QM], 030217 neurology & neurosurgery, Software, Algorithms
Abstract: Background Sequence similarity searching is an important and challenging task in molecular biology and next-generation sequencing should further strengthen the need for faster algorithms to process such vast amounts of data. At the same time, the internal architecture of current microprocessors is tending towards more parallelism, leading to the use of chips with two, four and more cores integrated on the same die. The main purpose of this work was to design an effective algorithm to fit with the parallel capabilities of modern microprocessors. Results A parallel algorithm for comparing large genomic banks and targeting middle-range computers has been developed and implemented in PLAST software. The algorithm exploits two key parallel features of existing and future microprocessors: the SIMD programming model (SSE instruction set) and the multithreading concept (multicore). Compared to multithreaded BLAST software, tests performed on an 8-processor server have shown speedup ranging from 3 to 6 with a similar level of accuracy. Conclusion A parallel algorithmic approach driven by the knowledge of the internal microprocessor architecture allows significant speedup to be obtained while preserving standard sensitivity for similarity search problems.
Published: 2009

9. Optimal neighborhood indexing for protein similarity search

Author: Mathieu Giraud, Gregory Kucherov, Van Hoa Nguyen, Dominique Lavenier, Laurent Noé, Pierre Peterlongo, Biological systems and models, bioinformatics and sequences (SYMBIOSE), Institut de Recherche en Informatique et Systèmes Aléatoires (IRISA), Université de Rennes 1 (UR1), Université de Rennes (UNIV-RENNES)-Université de Rennes (UNIV-RENNES)-Institut National des Sciences Appliquées - Rennes (INSA Rennes), Institut National des Sciences Appliquées (INSA)-Université de Rennes (UNIV-RENNES)-Institut National des Sciences Appliquées (INSA)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Université de Rennes 1 (UR1), Institut National des Sciences Appliquées (INSA)-Université de Rennes (UNIV-RENNES)-Institut National des Sciences Appliquées (INSA)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Inria Rennes – Bretagne Atlantique, Institut National de Recherche en Informatique et en Automatique (Inria), Laboratoire d'Informatique Fondamentale de Lille (LIFL), Université de Lille, Sciences et Technologies-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lille, Sciences Humaines et Sociales-Centre National de la Recherche Scientifique (CNRS), Sequential Learning (SEQUOIA), Université de Lille, Sciences et Technologies-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lille, Sciences Humaines et Sociales-Centre National de la Recherche Scientifique (CNRS)-Université de Lille, Sciences et Technologies-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lille, Sciences Humaines et Sociales-Centre National de la Recherche Scientifique (CNRS)-Inria Lille - Nord Europe, INRIA ARC Flash, Université de Rennes (UR)-Institut National des Sciences Appliquées - Rennes (INSA Rennes), Institut National des Sciences Appliquées (INSA)-Institut National des Sciences Appliquées (INSA)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Université de Rennes (UR)-Institut National des Sciences Appliquées - Rennes (INSA Rennes), and Institut National des Sciences Appliquées (INSA)-Institut National des Sciences Appliquées (INSA)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Inria Rennes – Bretagne Atlantique
Subjects: Speedup, Sequence analysis, Computer science, Abstracting and Indexing, 030303 biophysics, [INFO.INFO-DS]Computer Science [cs]/Data Structures and Algorithms [cs.DS], Information Storage and Retrieval, Sequence alignment, lcsh:Computer applications to medicine. Medical informatics, computer.software_genre, Biochemistry, Reduction (complexity), 03 medical and health sciences, Text mining, Similarity (network science), Structural Biology, Sequence Analysis, Protein, Limit (mathematics), Databases, Protein, lcsh:QH301-705.5, Molecular Biology, 030304 developmental biology, chemistry.chemical_classification, 0303 health sciences, Biological data, business.industry, Applied Mathematics, Search engine indexing, Computational Biology, Proteins, [SDV.BIBS]Life Sciences [q-bio]/Quantitative Methods [q-bio.QM], Computer Science Applications, Amino acid, lcsh:Biology (General), chemistry, lcsh:R858-859.7, Data mining, DNA microarray, [INFO.INFO-BI]Computer Science [cs]/Bioinformatics [q-bio.QM], business, Heuristics, computer, Algorithm, Sequence Alignment, Algorithms, Research Article
Abstract: Background Similarity inference, one of the main bioinformatics tasks, has to face an exponential growth of the biological data. A classical approach used to cope with this data flow involves heuristics with large seed indexes. In order to speed up this technique, the index can be enhanced by storing additional information to limit the number of random memory accesses. However, this improvement leads to a larger index that may become a bottleneck. In the case of protein similarity search, we propose to decrease the index size by reducing the amino acid alphabet. Results The paper presents two main contributions. First, we show that an optimal neighborhood indexing combining an alphabet reduction and a longer neighborhood leads to a reduction of 35% of memory involved into the process, without sacrificing the quality of results nor the computational time. Second, our approach led us to develop a new kind of substitution score matrices and their associated e-value parameters. In contrast to usual matrices, these matrices are rectangular since they compare amino acid groups from different alphabets. We describe the method used for computing those matrices and we provide some typical examples that can be used in such comparisons. Supplementary data can be found on the website http://bioinfo.lifl.fr/reblosum. Conclusion We propose a practical index size reduction of the neighborhood data, that does not negatively affect the performance of large-scale search in protein sequences. Such an index can be used in any study involving large protein data. Moreover, rectangular substitution score matrices and their associated statistical parameters can have applications in any study involving an alphabet reduction.
Published: 2008

10. ECDomainMiner: discovering hidden associations between enzyme commission numbers and Pfam domains

Author: Alborzi, Seyed Ziaeddin, Devignes, Marie-Dominique, Ritchie, David, Computational Algorithms for Protein Structures and Interactions (CAPSID), Inria Nancy - Grand Est, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Department of Complex Systems, Artificial Intelligence & Robotics (LORIA - AIS), Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL), Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), and Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)
Subjects: Protein domain, Pfam domain, Content-based filtering, Enzyme commission number, [SDV]Life Sciences [q-bio], Protein function, food and beverages, Biochemistry, Molecular Biology, Computer Science Applications
Abstract: International audience; BackgroundMany entries in the protein data bank (PDB) are annotated to show their component protein domains according to the Pfam classification, as well as their biological function through the enzyme commission (EC) numbering scheme. However, despite the fact that the biological activity of many proteins often arises from specific domain-domain and domain-ligand interactions, current on-line resources rarely provide a direct mapping from structure to function at the domain level. Since the PDB now contains many tens of thousands of protein chains, and since protein sequence databases can dwarf such numbers by orders of magnitude, there is a pressing need to develop automatic structure-function annotation tools which can operate at the domain level.
Full Text: View/download PDF

11. Homology modelling of protein-protein complexes: a simple method and its possibilities and limitations

Author: Thomas Simonson, Guillaume Launay, Biological systems and models, bioinformatics and sequences (SYMBIOSE), Institut de Recherche en Informatique et Systèmes Aléatoires (IRISA), Université de Rennes 1 (UR1), Université de Rennes (UNIV-RENNES)-Université de Rennes (UNIV-RENNES)-Institut National des Sciences Appliquées - Rennes (INSA Rennes), Institut National des Sciences Appliquées (INSA)-Université de Rennes (UNIV-RENNES)-Institut National des Sciences Appliquées (INSA)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Université de Rennes 1 (UR1), Institut National des Sciences Appliquées (INSA)-Université de Rennes (UNIV-RENNES)-Institut National des Sciences Appliquées (INSA)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Inria Rennes – Bretagne Atlantique, Institut National de Recherche en Informatique et en Automatique (Inria), Laboratoire de Biochimie de l'Ecole polytechnique (BIOC), Centre National de la Recherche Scientifique (CNRS)-École polytechnique (X), Université de Rennes (UR)-Institut National des Sciences Appliquées - Rennes (INSA Rennes), Institut National des Sciences Appliquées (INSA)-Institut National des Sciences Appliquées (INSA)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Université de Rennes (UR)-Institut National des Sciences Appliquées - Rennes (INSA Rennes), Institut National des Sciences Appliquées (INSA)-Institut National des Sciences Appliquées (INSA)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Inria Rennes – Bretagne Atlantique, and École polytechnique (X)-Centre National de la Recherche Scientifique (CNRS)
Subjects: Computer science, Protein Conformation, 030303 biophysics, Binding energy, Structural alignment, Bioinformatics, lcsh:Computer applications to medicine. Medical informatics, Biochemistry, Homology (biology), 03 medical and health sciences, Protein structure, Structural Biology, Protein Interaction Domains and Motifs, Native structure, lcsh:QH301-705.5, Molecular Biology, ComputingMilieux_MISCELLANEOUS, 030304 developmental biology, 0303 health sciences, Applied Mathematics, Protein protein, Proteins, [SDV.BIBS]Life Sciences [q-bio]/Quantitative Methods [q-bio.QM], Sequence identity, Computer Science Applications, lcsh:Biology (General), Structural Homology, Protein, lcsh:R858-859.7, DNA microarray, Threading (protein sequence), [INFO.INFO-BI]Computer Science [cs]/Bioinformatics [q-bio.QM], Biological system, Algorithms, Research Article, Protein Binding
Abstract: Background Structure-based computational methods are needed to help identify and characterize protein-protein complexes and their function. For individual proteins, the most successful technique is homology modelling. We investigate a simple extension of this technique to protein-protein complexes. We consider a large set of complexes of known structures, involving pairs of single-domain proteins. The complexes are compared with each other to establish their sequence and structural similarities and the relation between the two. Compared to earlier studies, a simpler dataset, a simpler structural alignment procedure, and an additional energy criterion are used. Next, we compare the Xray structures to models obtained by threading the native sequence onto other, homologous complexes. An elementary requirement for a successful energy function is to rank the native structure above any threaded structure. We use the DFIREβ energy function, whose quality and complexity are typical of the models used today. Finally, we compare near-native models to distinctly non-native models. Results If weakly stable complexes are excluded (defined by a binding energy cutoff), as well as a few unusual complexes, a simple homology principle holds: complexes that share more than 35% sequence identity share similar structures and interaction modes; this principle was less clearcut in earlier studies. The energy function was then tested for its ability to identify experimental structures among sets of decoys, produced by a simple threading procedure. On average, the experimental structure is ranked above 92% of the alternate structures. Thus, discrimination of the native structure is good but not perfect. The discrimination of near-native structures is fair. Typically, a single, alternate, non-native binding mode exists that has a native-like energy. Some of the associated failures may correspond to genuine, alternate binding modes and/or native complexes that are artefacts of the crystal environment. In other cases, additional model filtering with more sophisticated tools is needed. Conclusion The results suggest that the simple modelling procedure applied here could help identify and characterize protein-protein complexes. The next step is to apply it on a genomic scale.
Full Text: View/download PDF

12. A fast method for calculating reliable event supports in tree reconciliations via Pareto optimality

Author: Celine Scornavacca, Thu-Hien To, Edwin Jacox, Vincent Ranwez, Institut des Sciences de l'Evolution de Montpellier (UMR ISEM), Centre de Coopération Internationale en Recherche Agronomique pour le Développement (Cirad)-École Pratique des Hautes Études (EPHE), Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-Université de Montpellier (UM)-Institut de recherche pour le développement [IRD] : UR226-Centre National de la Recherche Scientifique (CNRS), Centre National de la Recherche Scientifique (CNRS), Institut de Recherche pour le Développement (IRD [France-Ouest]), École Pratique des Hautes Études (EPHE), Université Paris sciences et lettres (PSL), Institut de Biologie Computationnelle (IBC), Institut National de la Recherche Agronomique (INRA)-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Montpellier (UM)-Centre National de la Recherche Scientifique (CNRS), Amélioration génétique et adaptation des plantes méditerranéennes et tropicales (UMR AGAP), Centre de Coopération Internationale en Recherche Agronomique pour le Développement (Cirad)-Institut National de la Recherche Agronomique (INRA)-Centre international d'études supérieures en sciences agronomiques (Montpellier SupAgro)-Institut national d’études supérieures agronomiques de Montpellier (Montpellier SupAgro), Institut national d’études supérieures agronomiques de Montpellier (Montpellier SupAgro), French Agence Nationale de la Recherche Investissements d'Avenir/Bioinformatique : ANR-10-BINF-01-02, Centre de Coopération Internationale en Recherche Agronomique pour le Développement (Cirad)-École pratique des hautes études (EPHE)-Université de Montpellier (UM)-Institut de recherche pour le développement [IRD] : UR226-Centre National de la Recherche Scientifique (CNRS), École pratique des hautes études (EPHE), Université de Montpellier (UM)-Institut National de la Recherche Agronomique (INRA)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS), Institut national d’études supérieures agronomiques de Montpellier (Montpellier SupAgro)-Institut National de la Recherche Agronomique (INRA)-Centre de Coopération Internationale en Recherche Agronomique pour le Développement (Cirad)-Centre international d'études supérieures en sciences agronomiques (Montpellier SupAgro), Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-Université de Montpellier (UM)-Centre de Coopération Internationale en Recherche Agronomique pour le Développement (Cirad)-Centre National de la Recherche Scientifique (CNRS)-Institut de recherche pour le développement [IRD] : UR226, Institut national d'enseignement supérieur pour l'agriculture, l'alimentation et l'environnement (Institut Agro)-Institut national d'enseignement supérieur pour l'agriculture, l'alimentation et l'environnement (Institut Agro), and Institut national d'enseignement supérieur pour l'agriculture, l'alimentation et l'environnement (Institut Agro)
Subjects: [SDV.SA]Life Sciences [q-bio]/Agricultural sciences, programme informatique, multi-objective particle swarm optimization (mopso), Supports, Reliability (computer networking), 0206 medical engineering, 02 engineering and technology, Biology, computer.software_genre, Gene evolution, Biochemistry, Set (abstract data type), évolution génétique, Evolution, Molecular, 03 medical and health sciences, Structural Biology, computer program, Tree reconciliation, Proteobacteria, Computer Simulation, bioinformatique, Molecular Biology, Phylogeny, 030304 developmental biology, Event (probability theory), 0303 health sciences, arbre, Applied Mathematics, Pareto principle, Reproducibility of Results, tree reconciliation, gene evolution, phylogenetics, parsimony, supports, Agricultural sciences, Computer Science Applications, Phylogenetics, Range (mathematics), Tree (data structure), Genes, Bacterial, A priori and a posteriori, Graph (abstract data type), Data mining, computer, Algorithm, Parsimony, 020602 bioinformatics, Sciences agricoles, algorithme, Algorithms, génétique appliquée, Research Article
Abstract: Background Given a gene and a species tree, reconciliation methods attempt to retrieve the macro-evolutionary events that best explain the discrepancies between the two tree topologies. The DTL parsimonious approach searches for a most parsimonious reconciliation between a gene tree and a (dated) species tree, considering four possible macro-evolutionary events (speciation, duplication, transfer, and loss) with specific costs. Unfortunately, many events are erroneously predicted due to errors in the input trees, inappropriate input cost values or because of the existence of several equally parsimonious scenarios. It is thus crucial to provide a measure of the reliability for predicted events. It has been recently proposed that the reliability of an event can be estimated via its frequency in the set of most parsimonious reconciliations obtained using a variety of reasonable input cost vectors. To compute such a support, a straightforward but time-consuming approach is to generate the costs slightly departing from the original ones, independently compute the set of all most parsimonious reconciliations for each vector, and combine these sets a posteriori. Another proposed approach uses Pareto-optimality to partition cost values into regions which induce reconciliations with the same number of DTL events. The support of an event is then defined as its frequency in the set of regions. However, often, the number of regions is not large enough to provide reliable supports. Results We present here a method to compute efficiently event supports via a polynomial-sized graph, which can represent all reconciliations for several different costs. Moreover, two methods are proposed to take into account alternative input costs: either explicitly providing an input cost range or allowing a tolerance for the over cost of a reconciliation. Our methods are faster than the region based method, substantially faster than the sampling-costs approach, and have a higher event-prediction accuracy on simulated data. Conclusions We propose a new approach to improve the accuracy of event supports for parsimonious reconciliation methods to account for uncertainty in the input costs. Furthermore, because of their speed, our methods can be used on large gene families. Our algorithms are implemented in the ecceTERA program, freely available from http://mbb.univ-montp2.fr/MBB/. Electronic supplementary material The online version of this article (doi:10.1186/s12859-015-0803-x) contains supplementary material, which is available to authorized users.
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

12 results on '"Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)"'

1. Computational discovery of direct associations between GO terms and protein domains

2. GPCRs from fusarium graminearum detection, modeling and virtual screening - the search for new routes to control head blight disease

3. Integrative relational machine-learning for understanding drug side-effect profiles

4. Simple Shared Motifs (SSM) in conserved region of promoters: a new approach to identify co-regulation patterns

5. Inferring the role of transcription factors in regulatory networks

6. Browsing repeats in genomes: Pygram and an application to non-coding region analysis

7. JUICE: a data management system that facilitates the analysis of large volumes of information in an EST project workflow

8. PLAST: parallel local alignment search tool for database comparison

9. Optimal neighborhood indexing for protein similarity search

10. ECDomainMiner: discovering hidden associations between enzyme commission numbers and Pfam domains

11. Homology modelling of protein-protein complexes: a simple method and its possibilities and limitations

12. A fast method for calculating reliable event supports in tree reconciliations via Pareto optimality

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Database

Publisher

12 results on '"Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources