54 results on '"Anne-Lise Veuthey"'
Search Results
2. Recherche d'information et analyse bibliographique appliquées à la mise à jour automatique de Swiss-Prot.
- Author
-
Imad Tbahriti, Anne-Lise Veuthey, Patrick Ruch, and Julien Gobeill
- Published
- 2007
- Full Text
- View/download PDF
3. Using Discourse Analysis to Improve Text Categorization in MEDLINE.
- Author
-
Patrick Ruch, Antoine Geissbühler, Julien Gobeill, Frédéric Lisacek, Imad Tbahriti, Anne-Lise Veuthey, and Alan R. Aronson
- Published
- 2007
- Full Text
- View/download PDF
4. Latent Argumentative Pruning for Compact MEDLINE Indexing.
- Author
-
Patrick Ruch, Robert H. Baud, Johan Marty, Antoine Geissbühler, Imad Tbahriti, and Anne-Lise Veuthey
- Published
- 2005
- Full Text
- View/download PDF
5. Extracting Key Sentences with Latent Argumentative Structuring.
- Author
-
Patrick Ruch, Robert H. Baud, Christine Chichester, Antoine Geissbühler, Frédérique Lisacek, Johan Marty, Dietrich Rebholz-Schuhmann, Imad Tbahriti, and Anne-Lise Veuthey
- Published
- 2005
6. Corpus-Based vs. Model-Based Selection of Relevant Features.
- Author
-
Cyril Goutte, Pavel B. Dobrokhotov, éric Gaussier, and Anne-Lise Veuthey
- Published
- 2004
- Full Text
- View/download PDF
7. Motifs tree: a new method for predicting post-translational modifications.
- Author
-
Christophe Charpilloz, Anne-Lise Veuthey, Bastien Chopard, and Jean-Luc Falcone
- Published
- 2014
- Full Text
- View/download PDF
8. Combining NLP and probabilistic categorisation for document and term selection for Swiss-Prot medical annotation.
- Author
-
Pavel B. Dobrokhotov, Cyril Goutte, Anne-Lise Veuthey, and éric Gaussier
- Published
- 2003
9. A Probabilistic Information Retrieval Approach to Medical Annotation in SWISS-PROT.
- Author
-
Pavel B. Dobrokhotov, Cyril Goutte, Anne-Lise Veuthey, and éric Gaussier
- Published
- 2003
- Full Text
- View/download PDF
10. Using argumentation to extract key sentences from biomedical abstracts.
- Author
-
Patrick Ruch, Célia Boyer, Christine Chichester, Imad Tbahriti, Antoine Geissbühler, Paul Fabry, Julien Gobeill, Violaine Pillet, Dietrich Rebholz-Schuhmann, Christian Lovis, and Anne-Lise Veuthey
- Published
- 2007
- Full Text
- View/download PDF
11. Retrieving Mutation-Specific Information for Human proteins in UniProt/Swiss-PROT knowledgebase.
- Author
-
Yum Lina Yip, Nathalie Lachenal, Violaine Pillet, and Anne-Lise Veuthey
- Published
- 2007
- Full Text
- View/download PDF
12. Assisting medical annotation in Swiss-Prot using statistical classifiers.
- Author
-
Pavel B. Dobrokhotov, Cyril Goutte, Anne-Lise Veuthey, and éric Gaussier
- Published
- 2005
- Full Text
- View/download PDF
13. Automated annotation of microbial proteomes in SWISS-PROT.
- Author
-
Alexandre Gattiker, Karine Michoud, Catherine Rivoire, Andrea H. Auchincloss, Elisabeth Coudert, Tania Lima, Paul J. Kersey, Marco Pagni, Christian J. A. Sigrist, Corinne Lachaize, Anne-Lise Veuthey, Elisabeth Gasteiger, and Amos Bairoch
- Published
- 2003
- Full Text
- View/download PDF
14. Text mining for the biocuration workflow.
- Author
-
Lynette Hirschman, Gully A. P. C. Burns, Martin Krallinger, Cecilia N. Arighi, K. Bretonnel Cohen, Alfonso Valencia, Cathy H. Wu, Andrew Chatr-aryamontri, Karen G. Dowell, Eva Huala, Anália Lourenço, Robert S. Nash, Anne-Lise Veuthey, Thomas C. Wiegers, and Andrew G. Winter
- Published
- 2012
- Full Text
- View/download PDF
15. Mapping proteins to disease terminologies: from UniProt to MeSH.
- Author
-
Anaïs Mottaz, Yum Lina Yip, Patrick Ruch, and Anne-Lise Veuthey
- Published
- 2008
- Full Text
- View/download PDF
16. Gene Ontology density estimation and discourse analysis for automatic GeneRiF extraction.
- Author
-
Julien Gobeill, Imad Tbahriti, Frédéric Ehrler, Anaïs Mottaz, Anne-Lise Veuthey, and Patrick Ruch
- Published
- 2008
- Full Text
- View/download PDF
17. Mapping protein information to disease terminologies.
- Author
-
Anaïs Mottaz, Yum Lina Yip, Patrick Ruch, and Anne-Lise Veuthey
- Published
- 2007
- Full Text
- View/download PDF
18. Easy retrieval of single amino-acid polymorphisms and phenotype information using SwissVar.
- Author
-
Anaïs Mottaz, Fabrice P. A. David, Anne-Lise Veuthey, and Yum Lina Yip
- Published
- 2010
- Full Text
- View/download PDF
19. GPSDB: a new database for synonyms expansion of gene and protein names.
- Author
-
Violaine Pillet, Marc Zehnder, Alexander K. Seewald, Anne-Lise Veuthey, and Johann Petrak
- Published
- 2005
- Full Text
- View/download PDF
20. Application of text-mining for updating protein post-translational modification annotation in UniProtKB.
- Author
-
Anne-Lise Veuthey, Alan J. Bridge, Julien Gobeill, Patrick Ruch, Johanna R. McEntyre, Lydie Bougueleret, and Ioannis Xenarios
- Published
- 2013
- Full Text
- View/download PDF
21. GPSDB: a new database for synonyms expansion of gene and protein names
- Author
-
Johann Petrak, Anne-Lise Veuthey, Violaine Pillet, Alexander K. Seewald, and Marc Zehnder
- Subjects
Statistics and Probability ,MEDLINE ,InformationSystems_INFORMATIONSTORAGEANDRETRIEVAL ,Information Storage and Retrieval ,Documentation ,Biology ,computer.software_genre ,Biochemistry ,Set (abstract data type) ,User-Computer Interface ,Terminology as Topic ,Databases, Protein ,Molecular Biology ,Gene ,Database catalog ,Natural Language Processing ,Database ,Information Dissemination ,Proteins ,Computer Science Applications ,Computational Mathematics ,ComputingMethodologies_PATTERNRECOGNITION ,Vocabulary, Controlled ,Computational Theory and Mathematics ,Search interface ,Database Management Systems ,ComputingMethodologies_GENERAL ,computer - Abstract
Summary: We present a new database, GPSDB (Gene and Protein Synonyms DataBase) which collects gene/protein names, in a species specific way, from 14 main biological resources. A web-based search interface gives access to the database: given a gene/protein name, it retrieves all synonyms for this entity and queries Medline with a set of user-selected terms. Availability: GPSDB is freely available from http://biomint.oefai.at/ Contact: [email protected]
- Published
- 2017
22. Easy retrieval of single amino-acid polymorphisms and phenotype information using SwissVar
- Author
-
Anaïs Mottaz, Fabrice Pierre André David, Anne-Lise Veuthey, and Yum Lina Yip
- Subjects
Statistics and Probability ,Proteomics ,Databases and Ontologies ,Context (language use) ,Computational biology ,Biology ,Biochemistry ,Polymorphism, Single Nucleotide ,03 medical and health sciences ,Search engine ,0302 clinical medicine ,Single amino acid ,Amino Acid Sequence ,Amino Acids ,Databases, Protein ,Molecular Biology ,030304 developmental biology ,Genetics ,Supplementary data ,0303 health sciences ,Proteins ,Phenotype ,Computer Science Applications ,Computational Mathematics ,Applications Note ,Computational Theory and Mathematics ,Key (cryptography) ,UniProt ,030217 neurology & neurosurgery - Abstract
Summary: The SwissVar portal provides access to a comprehensive collection of single amino acid polymorphisms and diseases in the UniProtKB/Swiss-Prot database via a unique search engine. In particular, it gives direct access to the newly improved Swiss-Prot variant pages. The key strength of this portal is that it provides a possibility to query for similar diseases, as well as the underlying protein products and the molecular details of each variant. In the context of the recently proposed molecular view on diseases, the SwissVar portal should be in a unique position to provide valuable information for researchers and to advance research in this area. Availability: The SwissVar portal is available at www.expasy.org/swissvar Contact: anais.mottaz@isb-sib.ch; lina.yip@isb-sib.ch Supplementary information: Supplementary data are available at Bioinformatics online.
- Published
- 2017
23. Cellular and Subcellular Localization of Hexokinase, Glutamate Dehydrogenase, and Alanine Aminotransferase in the Honeybee Drone Retina
- Author
-
Lourdes Millan de Ruiz, M. Tsacopoulos, Philippe Perrottet, and Anne-Lise Veuthey
- Subjects
Male ,Adenylate kinase ,Biology ,Mitochondrion ,Biochemistry ,Retina ,Cellular and Molecular Neuroscience ,chemistry.chemical_compound ,Cytosol ,Glutamate Dehydrogenase ,Hexokinase ,Microsomes ,Animals ,Phosphoglycerate kinase ,Glutamate dehydrogenase ,Succinate dehydrogenase ,Alanine Transaminase ,Bees ,Subcellular localization ,Mitochondria ,Microscopy, Electron ,chemistry ,biology.protein ,Photoreceptor Cells, Invertebrate ,Neuroglia ,Subcellular Fractions - Abstract
Subcellular localization of hexokinase in the honeybee drone retina was examined following fractionation of cell homogenate using differential centrifugation. Nearly all hexokinase activity was found in the cytosolic fraction, following a similar distribution as the cytosolic enzymatic marker, phosphoglycerate kinase. The distribution of enzymatic markers of mitochondria (succinate dehydrogenase, rotenone-insensitive cytochrome c reductase, and adenylate kinase) indicated that the outer mitochondrial membrane was partly damaged, but their distributions were different from that of hexokinase. The activity of hexokinase in purified suspensions of cells was fivefold higher in glial cells than in photoreceptors. This result is consistent with the hypothesis based on quantitative 2-deoxy[3H]glucose autoradiography that only glial cells phosphorylate significant amounts of glucose to glucose-6-phosphate. The activities of alanine aminotransferase and to a lesser extent of glutamate dehydrogenase were higher in the cytosolic than in the mitochondrial fraction. This important cytosolic activity of glutamate dehydrogenase was consistent with the higher activity found in mitochondria-poor glial cells. In conclusion, this distribution of enzymes is consistent with the model of metabolic interactions between glial and photoreceptor cells in the intact bee retina.
- Published
- 2008
- Full Text
- View/download PDF
24. Mapping protein information to disease terminologies
- Author
-
Patrick Ruch, Yum Lina Yip, Anaïs Mottaz, and Anne-Lise Veuthey
- Subjects
Disease description ,Set (abstract data type) ,Information retrieval ,Computer science ,UniProt Knowledgebase ,Data processing, computer science, computer systems ,General Medicine ,Disease ,TP248.13-248.65 ,Biotechnology ,Terminology - Abstract
In order to improve the accessibility of genomic and proteomic information to medical researchers, we have developed a procedure to link biological information on proteins involved in diseases to the MeSH and ICD-10 disease terminologies. For this purpose, we took advantage of the manually curated disease annotations in more than 2,000 human protein entries of the UniProt KnowledgeBase. We mapped disease names extracted from the entry comment lines or from the corresponding OMIM entry to the MeSH. The method was assessed on a benchmark set of 200 manually mapped disease comment lines. We obtained a recall of 54% for 91% precision. The same procedure was used to map the more than 3,000 diseases in Swiss-Prot to MeSH with comparable efficiency. Tested on ICD-10, the coverage of the mapped terms was lower, which could be explained by the coarse-grained structure of this terminology for hereditary disease description. The mapping is provided as supplementary material at http://research.isbsib.ch/unimed.
- Published
- 2007
- Full Text
- View/download PDF
25. Using argumentation to extract key sentences from biomedical abstracts
- Author
-
Violaine Pillet, Célia Boyer, Julien Gobeill, Imad Tbahriti, Antoine Geissbuhler, Christine Chichester, Anne-Lise Veuthey, Paul Fabry, Christian Lovis, Patrick Ruch, and Dietrich Rebholz-Schuhmann
- Subjects
Argumentative ,Abstracting and Indexing ,Computer science ,MEDLINE ,Information Storage and Retrieval ,Health Informatics ,computer.software_genre ,Artificial Intelligence ,Terminology as Topic ,Selection (linguistics) ,Unique key ,Natural Language Processing ,Information retrieval ,business.industry ,Libraries, Digital ,Bayes Theorem ,Class (biology) ,Vocabulary, Controlled ,Bibliometrics ,Key (cryptography) ,Artificial intelligence ,Periodicals as Topic ,Heuristics ,business ,computer ,Sentence ,Natural language processing - Abstract
PROBLEM: key word assignment has been largely used in MEDLINE to provide an indicative “gist” of the content of articles and to help retrieving biomedical articles. Abstracts are also used for this purpose. However with usually more than 300 words, MEDLINE abstracts can still be regarded as long documents; therefore we design a system to select a unique key sentence. This key sentence must be indicative of the article's content and we assume that abstract's conclusions are good candidates. We design and assess the performance of an automatic key sentence selector, which classifies sentences into four argumentative moves: PURPOSE, METHODS, RESULTS and CONCLUSION. METHODS: we rely on Bayesian classifiers trained on automatically acquired data. Features representation, selection and weighting are reported and classification effectiveness is evaluated on the four classes using confusion matrices. We also explore the use of simple heuristics to take the position of sentences into account. Recall, precision and F-scores are computed for the CONCLUSION class. For the CONCLUSION class, the F-score reaches 84%. Automatic argumentative classification using Bayesian learners is feasible on MEDLINE abstracts and should help user navigation in such repositories.
- Published
- 2007
- Full Text
- View/download PDF
26. N-Terminal myristoylation predictions by ensembles of neural networks
- Author
-
Anne-Lise Veuthey, Séverine Duvaud, Cédric Yvon, and Guido Bologna
- Subjects
Databases, Factual ,Glycine ,Decision tree ,Word error rate ,PROSITE ,Biology ,Machine learning ,computer.software_genre ,Sensitivity and Specificity ,Biochemistry ,Artificial Intelligence ,Amino Acid Sequence ,Amino Acids ,Molecular Biology ,Probability ,Myristoylation ,Artificial neural network ,business.industry ,Pattern recognition ,lipids (amino acids, peptides, and proteins) ,Neural Networks, Computer ,Artificial intelligence ,business ,Myristic Acids ,Protein Processing, Post-Translational ,computer - Abstract
N-terminal myristoylation is a post-translational modification that causes the addition of a myristate to a glycine in the N-terminal end of the amino acid chain. This work presents neural network (NN) models that learn to discriminate myristoylated and nonmyristoylated proteins. Ensembles of 25 NNs and decision trees were trained on 390 positive sequences and 327 negative sequences. Experiments showed that NN ensembles were more accurate than decision tree ensembles. Our NN predictor evaluated by the leave-one-out procedure, obtained a false positive error rate equal to 2.1%. That was better than the PROSITE pattern for myristoylation for which the false positive error rate was 22.3%. On a recent version of Swiss-Prot (41.2), the NN ensemble predicted 876 myristoylated proteins, while 1150 proteins were predicted by the PROSITE pattern for myristoylation. Finally, compared to the well-known NMT predictor, the NN predictor gave similar results. Our tool is available under http://www.expasy.org/tools/myristoylator/myristoylator.html.
- Published
- 2004
- Full Text
- View/download PDF
27. Combining NLP and probabilistic categorisation for document and term selection for Swiss-Prot medical annotation
- Author
-
Anne-Lise Veuthey, Pavel B. Dobrokhotov, Cyril Goutte, and Eric Gaussier
- Subjects
Statistics and Probability ,PubMed ,Abstracting and Indexing ,Computer science ,Documentation ,Temporal annotation ,computer.software_genre ,Biochemistry ,Pattern Recognition, Automated ,Annotation ,Artificial Intelligence ,Selection (linguistics) ,Relevance (information retrieval) ,Databases, Protein ,Molecular Biology ,Natural Language Processing ,Probabilistic classification ,Models, Statistical ,Information retrieval ,business.industry ,Probabilistic logic ,Proteins ,Computer Science Applications ,Computational Mathematics ,ComputingMethodologies_PATTERNRECOGNITION ,Computational Theory and Mathematics ,Artificial intelligence ,Periodicals as Topic ,UniProt ,business ,computer ,Algorithms ,Natural language processing - Abstract
Motivation: Searching relevant publications for manual database annotation is a tedious task. In this paper, we apply a combination of Natural Language Processing (NLP) and probabilistic classification to re-rank documents returned by PubMed according to their relevance to Swiss-Prot annotation, and to identify significant terms in the documents. Results: With a Probabilistic Latent Categoriser (PLC) we obtained 69% recall and 59% precision for relevant documents in a representative query. As the PLC technique provides the relative contribution of each term to the final document score, we used the Kullback-Leibler symmetric divergence to determine the most discriminating words for Swiss-Prot medical annotation. This information should allow curators to understand classification results better. It also has great value for fine-tuning the linguistic pre-processing of documents, which in turn can improve the overall classifier performance. Availability: The medical annotation dataset is available from the authors upon request Contact: Pavel.Dobrokhotov@isb-sib.ch; Cyril.Goutte@xrce.xerox.com *To whom correspondence should be addressed.
- Published
- 2003
- Full Text
- View/download PDF
28. UniProt: a hub for protein information
- Author
-
Ursula Hinz, Prudence Mutowo, Laure Verbregue, Weizhong Li, Nadine Gruaz-Gumowski, Chantal Hulo, Hermann Zellner, Shyamala Sundaram, P Lemercier, Guoying Qi, Parit Bansal, Tony Sawford, Sebastien Gehant, Delphine Baratin, Francesco Fazzini, Monica Pozzato, Séverine Duvaud, Lai-Su L. Yeh, Nicole Redaschi, Emma Hatton-Ellis, Darren A. Natale, Damien Lieberherr, Luis Figueira, Bernd Roechert, Borisas Bursteinas, Gayatri Chavali, Brigitte Boeckmann, Cristina Casal-Casas, Baris E. Suzek, Cathy H. Wu, Paul Gane, Ghislaine Argoud-Puy, Klemens Pichler, Rachael P. Huntley, Sangya Pundir, Alan Bridge, Edouard de Castro, Benoit Bely, Kristian B. Axelsen, Emmanuel Boutet, Andre Stutz, Penelope Garmiri, Christian J. A. Sigrist, John S. Garavelli, Rolf Apweiler, Peter B. McGarvey, Patrick Masson, Maria Jesus Martin, K Sonesson, Xavier Watkins, Ioannis Xenarios, Vladimir Volynkin, Hamish McWilliam, Mark Bingley, Guillaume Keller, Hongzhan Huang, Rabie Saidi, Sylvain Poux, Tunca Doğan, Yuqi Wang, Diego Poggioli, Rodrigo Lopez, Alistair MacDougall, Kati Laiho, Qinghua Wang, W Liu, Carlos Bonilla, Duncan Legge, C. R. Vinayaka, Anne Morgat, Thierry Lombardot, Jerven Bolleman, Nevila Nouspikel, Aleksandra Shypitsyna, Emanuele Alpi, Yongxing Chen, Anne Lise Veuthey, Andrew Nightingale, Béatrice A. Cuche, Alex Bateman, Ramona Britto, Alan Wilter Sousa da Silva, Jie Luo, Lionel Breuza, Marie Claude Blatter, Elena Cibrian-Uhalte, Michel Schneider, Chuming Chen, Michele Magrane, L Famiglietti, Meher Shruti Yerramalla, Lydie Bougueleret, Vivienne Baillie Gerritsen, Anne Estreicher, Dolnide Dornevil, Catherine Rivoire, Jian Zhang, S Staehli, Andrew Peter Cowley, Tony Wardell, Ivo Pedruzzi, Andrea H. Auchincloss, Salvo Paesano, Elisabeth Gasteiger, Luis Pureza, Marc Feuermann, Leslie Arminski, Xavier D. Martin, Teresa Batista Neto, Steven Rosanoff, Florence Jungo, Sandra Orchard, Claire O'Donovan, Elisabeth Coudert, Ricardo Antunes, Sandrine Pilbout, Vicente Lara, Arnaud Gos, Reija Hieta, Manuela Pruess, Joanna Arganiska, Edward Turner, Maurizio De Giorgi, M Doche, Cecilia N. Arighi, Michael Tognolli, Leyla Jael Garcia Castro, and Lucila Aimo
- Subjects
Proteome ,Computer science ,Molecular Sequence Annotation ,Computational biology ,Accession number (bioinformatics) ,DNA sequencing ,World Wide Web ,Identifier ,Annotation ,Sequence Analysis, Protein ,Genetics ,Database Issue ,natural sciences ,UniProt ,Databases, Protein - Abstract
UniProt is an important collection of protein sequences and their annotations, which has doubled in size to 80 million sequences during the past year. This growth in sequences has prompted an extension of UniProt accession number space from 6 to 10 characters. An increasing fraction of new sequences are identical to a sequence that already exists in the database with the majority of sequences coming from genome sequencing projects. We have created a new proteome identifier that uniquely identifies a particular assembly of a species and strain or subspecies to help users track the provenance of sequences. We present a new website that has been designed using a user-experience design process. We have introduced an annotation score for all entries in UniProt to represent the relative amount of knowledge known about each protein. These scores will be helpful in identifying which proteins are the best characterized and most informative for comparative analysis. All UniProt data is provided freely and is available on the web at http://www.uniprot.org/.
- Published
- 2014
29. Annotation of glycoproteins in the SWISS‐PROT database
- Author
-
Eva Jung, Amos Marc Bairoch, Anne-Lise Veuthey, and Elisabeth Gasteiger
- Subjects
chemistry.chemical_classification ,Protein glycosylation ,animal structures ,Glycosylation ,Database ,Sequence database ,macromolecular substances ,Biology ,Proteomics ,computer.software_genre ,Biochemistry ,carbohydrates (lipids) ,Annotation ,chemistry.chemical_compound ,chemistry ,Posttranslational modification ,lipids (amino acids, peptides, and proteins) ,UniProt ,Glycoprotein ,Molecular Biology ,computer - Abstract
SWISS-PROT is a protein sequence database, which aims to be nonredundant, fully annotated and highly cross-referenced. Most eukaryotic gene products undergo co- and/or post-translational modifications, and these need to be included in the database in order to describe the mature protein. SWISS-PROT includes information on many types of different protein modifications. As glycosylation is the most common type of post-translational protein modification, we are currently placing an emphasis on annotation of protein glycosylation in SWISS-PROT. Information on the position of the sugar within the polypeptide chain, the reducing terminal linkage as well as additional information on biological function of the sugar is included in the database. In this paper we describe how we account for the different types of protein glycosylation, namely N-linked glycosylation, O-linked glycosylation, proteoglycans, C-linked glycosylation and the attachment of glycosyl-phosphatidylinosital anchors to proteins.
- Published
- 2001
- Full Text
- View/download PDF
30. Motifs tree: a new method for predicting post-translational modifications
- Author
-
Anne-Lise Veuthey, Jean-Luc Falcone, Bastien Chopard, and Christophe Charpilloz
- Subjects
Statistics and Probability ,Computer science ,Amino Acid Motifs ,Decision tree ,Machine learning ,computer.software_genre ,Biochemistry ,Set (abstract data type) ,chemistry.chemical_compound ,Software ,Methionine ,Protein methods ,Acetyltransferases ,Artificial Intelligence ,Sequence Analysis, Protein ,Humans ,ddc:025.063 ,Molecular Biology ,business.industry ,Proteins ,Acetylation ,Computer Science Applications ,Computational Mathematics ,Tree (data structure) ,Computational Theory and Mathematics ,chemistry ,Posttranslational modification ,Artificial intelligence ,business ,computer ,Protein Processing, Post-Translational ,Algorithms - Abstract
Motivation: Post-translational modifications (PTMs) are important steps in the maturation of proteins. Several models exist to predict specific PTMs, from manually detected patterns to machine learning methods. On one hand, the manual detection of patterns does not provide the most efficient classifiers and requires an important workload, and on the other hand, models built by machine learning methods are hard to interpret and do not increase biological knowledge. Therefore, we developed a novel method based on patterns discovery and decision trees to predict PTMs. The proposed algorithm builds a decision tree, by coupling the C4.5 algorithm with genetic algorithms, producing high-performance white box classifiers. Our method was tested on the initiator methionine cleavage (IMC) and N α -terminal acetylation (N-Ac), two of the most common PTMs. Results: The resulting classifiers perform well when compared with existing models. On a set of eukaryotic proteins, they display a cross-validated Matthews correlation coefficient of 0.83 (IMC) and 0.65 (N-Ac). When used to predict potential substrates of N-terminal acetyltransferaseB and N-terminal acetyltransferaseC, our classifiers display better performance than the state of the art. Moreover, we present an analysis of the model predicting IMC for Homo sapiens proteins and demonstrate that we are able to extract experimentally known facts without prior knowledge. Those results validate the fact that our method produces white box models. Availability and implementation: Predictors for IMC and N-Ac and all datasets are freely available at http://terminus.unige.ch/ . Contact: jean-luc.falcone@unige.ch Supplementary information: Supplementary data are available at Bioinformatics online.
- Published
- 2014
31. The nutritive function of glia is regulated by signals released by neurons
- Author
-
Carol L. Poitry-Yamate, Philippe Perrottet, Serge Poitry, Marco Tsacopoulos, and Anne-Lise Veuthey
- Subjects
Male ,Proline ,Glutamic Acid ,In Vitro Techniques ,Biology ,Models, Biological ,Retina ,Cellular and Molecular Neuroscience ,Ammonia ,medicine ,Animals ,Neurons ,Alanine ,Glutamate dehydrogenase ,Glutamate receptor ,Bees ,Membrane transport ,NAD ,Cell biology ,Glucose ,medicine.anatomical_structure ,nervous system ,Neurology ,Biochemistry ,Neuroglia ,Photoreceptor Cells, Invertebrate ,NAD+ kinase ,Neuron ,NADP ,Signal Transduction ,Astrocyte - Abstract
The idea of a metabolic coupling between neurons and astrocytes in the brain has been entertained for about 100 years. The use recently of simple and well-compartmentalized nervous systems, such as the honeybee retina or purified preparations of neurons and glia, provided strong support for a nutritive function of glial cells: glial cells transform glucose to a fuel substrate taken up and used by neurons. Particularly, in the honeybee retina, photoreceptor-neurons consume alanine supplied by glial cells and exogenous proline. NH4+ and glutamate are transported into glia by functional plasma membrane transport systems. During increased activity a transient rise in the intraglial concentration of NH4+ or of glutamate causes a net increase in the level of reduced nicotinamide adenine dinucleotides [NAD(P)H]. Quantitative biochemistry showed that this is due to activation of glycolysis in glial cells by the direct action of NH4+ and of glutamate, probably on the enzymatic reactions controlled by phosphofructokinase alanine aminotransferase and glutamate dehydrogenase. This activation leads to a massive increase in the production and release of alanine by glia. This constitutes an intracellular signal and it depends upon the rate of conversion of NH4+ and of glutamate to alanine and alpha-ketoglutarate, respectively, in the glial cells. Alanine and alpha-ketoglutarate are released extracellularly and then taken up by neurons where they contribute to the maintenance of the mitochondrial redox potential. This signaling raises the novel hypothesis of a tight regulation of the nutritive function of glia.
- Published
- 1997
- Full Text
- View/download PDF
32. Application of text-mining for updating protein post-translational modification annotation in UniProtKB
- Author
-
Johanna McEntyre, Lydie Bougueleret, Patrick Ruch, Alan Bridge, Ioannis Xenarios, Julien Gobeill, and Anne-Lise Veuthey
- Subjects
Proteomics ,Computer science ,Process (engineering) ,Knowledge Bases ,0206 medical engineering ,02 engineering and technology ,Scientific literature ,computer.software_genre ,Biochemistry ,Manual curation ,03 medical and health sciences ,Annotation ,Structural Biology ,Data Mining ,Humans ,Databases, Protein ,Molecular Biology ,030304 developmental biology ,0303 health sciences ,Information retrieval ,Applied Mathematics ,Methodology Article ,Data Mining/methods ,Molecular Sequence Annotation ,Protein Processing, Post-Translational ,Computer Science Applications ,Task (computing) ,Information extraction ,Protein processing ,Posttranslational modification ,Data mining ,UniProt ,DNA microarray ,computer ,020602 bioinformatics - Abstract
The annotation of protein post-translational modifications (PTMs) is an important task of UniProtKB curators and, with continuing improvements in experimental methodology, an ever greater number of articles are being published on this topic. To help curators cope with this growing body of information we have developed a system which extracts information from the scientific literature for the most frequently annotated PTMs in UniProtKB. The procedure uses a pattern-matching and rule-based approach to extract sentences with information on the type and site of modification. A ranked list of protein candidates for the modification is also provided. For PTM extraction, precision varies from 57% to 94%, and recall from 75% to 95%, according to the type of modification. The procedure was used to track new publications on PTMs and to recover potential supporting evidence for phosphorylation sites annotated based on the results of large scale proteomics experiments. The information retrieval and extraction method we have developed in this study forms the basis of a simple tool for the manual curation of protein post-translational modifications in UniProtKB/Swiss-Prot. Our work demonstrates that even simple text-mining tools can be effectively adapted for database curation tasks, providing that a thorough understanding of the working process and requirements are first obtained. This system can be accessed at http://eagl.unige.ch/PTM/ .
- Published
- 2013
33. Text mining for the biocuration workflow
- Author
-
Cathy H. Wu, Anália Lourenço, Martin Krallinger, Andrew Chatr-aryamontri, Lynette Hirschman, Eva Huala, Robert J Nash, Anne-Lise Veuthey, Gully A. P. C. Burns, Karen G. Dowell, Cecilia N. Arighi, K. Bretonnel Cohen, Thomas C. Wiegers, Andrew G. Winter, Alfonso Valencia, and Universidade do Minho
- Subjects
Prioritization ,Biomedical Research ,Databases, Factual ,Computer science ,Biological database ,General Biochemistry, Genetics and Molecular Biology ,Workflow ,World Wide Web ,03 medical and health sciences ,0302 clinical medicine ,Text mining ,Animals ,Data Mining ,Humans ,Biocurator ,030304 developmental biology ,Natural Language Processing ,0303 health sciences ,Science & Technology ,business.industry ,Search engine indexing ,Data science ,Biomedical text mining ,3. Good health ,Identification (information) ,Original Article ,General Agricultural and Biological Sciences ,business ,030217 neurology & neurosurgery ,Information Systems - Abstract
Molecular biology has become heavily dependent on biological knowledge encoded in expert curated biological databases. As the volume of biological literature increases, biocurators need help in keeping up with the literature; (semi-) automated aids for biocuration would seem to be an ideal application for natural language processing and text mining. However, to date, there have been few documented successes for improving biocuration throughput using text mining. Our initial investigations took place for the workshop on ‘Text Mining for the BioCuration Workflow’ at the third International Biocuration Conference (Berlin, 2009). We interviewed biocurators to obtain workflows from eight biological databases. This initial study revealed high-level commonalities, including (i) selection of documents for curation; (ii) indexing of documents with biologically relevant entities (e.g. genes); and (iii) detailed curation of specific relations (e.g. interactions); however, the detailed workflows also showed many variabilities. Following the workshop, we conducted a survey of biocurators. The survey identified biocurator priorities, including the handling of full text indexed with biological entities and support for the identification and prioritization of documents for curation. It also indicated that two-thirds of the biocuration teams had experimented with text mining and almost half were using text mining at that time. Analysis of our interviews and survey provide a set of requirements for the integration of text mining into the biocuration workflow. These can guide the identification of common needs across curated databases and encourage joint experimentation involving biocurators, text mining developers and the larger biomedical research community., National Science Foundation (grant IIS-0844419 to L.H.); US National Institutes of Health National Library of Medicine (grant 1G08LM10720-01 to C.N.A. and C. H. W.); Work related to BioCreative III was supported by the US National Science Foundation (grant DBI-0850319 to C.N.A., L.H., C.H.W.); the US National Institute of General Medical Sciences (grant R01-GM083871 to G.A.P.C.B.); the National Science Foundation (DBI-0849977 to G.A.P.G.B); the European Union Seventh Framework MICROME project (Grant Agreement Number 222886-2 to M.K. and A.V.); the US National Science Foundation IGERT (Grant 0221625 to K.G.D) and a PhRMA Foundation predoctoral fellowship in informatics; US National Science Foundation (grant DBI-0850219 to E.H.); US National Human Genome Research Institute (grant HG001315 to R.N.); National Institutes of Health (NIH) (grant 2U01HG02712-04 to A.L.V.) and European Commission contract FELICS (grant 021902RII3); National Institute of Environmental Health Sciences (NIEHS) and the National Library of Medicine (NLM) (R01ES014065 to T.W.); NIEHS (R01ES014065-04S1 to T.W.); National Institutes of Health National Center for Research Resources(P20RR016463 to T.W.); Biotechnology and Biological Sciences Research Council of the UK (grant BB/F010486/1 to A.G.W); the National Institutes of Health National Center for Research Resources (1R01RR024031 to A.G.W); the European Commission FP7 Program (2007223411 to A.G.W). Funding for open access charge: The MITRE Corporation.
- Published
- 2012
34. Glial cells transform glucose to alanine, which fuels the neurons in the honeybee retina
- Author
-
M. Tsacopoulos, Anne-Lise Veuthey, G Tsoupras, Philippe Perrottet, and SG Saravelos
- Subjects
Light ,Proline ,Biology ,Retina ,Substrate Specificity ,chemistry.chemical_compound ,medicine ,Animals ,Amino Acids ,Phosphorylation ,Transaminases ,Neurons ,Alanine ,chemistry.chemical_classification ,Cahill cycle ,Glycogen ,General Neuroscience ,Nervous tissue ,Glutamate receptor ,Articles ,Metabolism ,Bees ,Mitochondria ,Amino acid ,Glucose ,medicine.anatomical_structure ,chemistry ,Biochemistry ,Neuroglia ,Photoreceptor Cells, Invertebrate ,Photic Stimulation - Abstract
The retina of honeybee drone is a nervous tissue with a crystal-like structure in which glial cells and photoreceptor neurons constitute two distinct metabolic compartments. The phosphorylation of glucose and its subsequent incorporation into glycogen occur in glia, whereas O2 consumption (QO2) occurs in the photoreceptors. Experimental evidence showed that glia phosphorylate glucose and supply the photoreceptors with metabolic substrates. We aimed to identify these transferred substrates. Using ion-exchange and reversed-phase HPLC and gas chromatography-mass spectrometry, we demonstrated that more than 50% of 14C(U)-glucose entering the glia is transformed to alanine by transamination of pyruvate with glutamate. In the absence of extracellular glucose, glycogen is used to make alanine; thus, its pool size in isolated retinas is maintained stable or even increased. Our model proposes that the formation of alanine occurs in the glia, thereby maintaining the redox potential of this cell and contributing to NH3 homeostasis. Alanine is released into the extracellular space and is then transported into photoreceptors using an Na(+)-dependent transport system. Purified suspensions of photoreceptors have similar alanine aminotransferase activity as glial cells and transform 14C- alanine to glutamate, aspartate, and CO2. Therefore, the alanine entering photoreceptors is transaminated to pyruvate, which in turn enters the Krebs cycle. Proline also supplies the Krebs cycle by making glutamate and, in turn, the intermediate alpha-ketoglutarate. Light stimulation caused a 200% increase of QO2 and a 50% decrease of proline and of glutamate. Also, the production of 14CO2 from 14C-proline was increased. The use of these amino acids would sustain about half of the light-induced delta QO2, the other half being sustained by glycogen via alanine formation. The use of proline meets a necessary anaplerotic function in the Krebs cycle, but implies high NH3 production. The results showed that alanine formation fixes NH3 at a rate exceeding glutamine formation. This is consistent with the rise of a glial pool of alanine upon photostimulation. In conclusion, the results strongly support a nutritive function for glia.
- Published
- 1994
- Full Text
- View/download PDF
35. A Preliminary Study on the Prediction of Human Protein Functions
- Author
-
Amos Marc Bairoch, Anne-Lise Veuthey, Marco Pagni, Lydie Lane, and Guido Bologna
- Subjects
Gene ontology ,Computer science ,business.industry ,Supervised learning ,Semi-supervised learning ,Functional prediction ,computer.software_genre ,Machine learning ,Measure (mathematics) ,ComputingMethodologies_PATTERNRECOGNITION ,Human proteome project ,Benchmark (computing) ,Protein function prediction ,Data mining ,Artificial intelligence ,ddc:025.063 ,ddc:576 ,business ,computer - Abstract
In the human proteome, about 5'000 proteins lack experimentally validated functional information. In this work we propose to tackle the problem of human protein function prediction by three distinct supervised learning schemes: one-versus-all classification; tournament learning; multi-label learning. Target values of supervised learning models are represented by the nodes of a subset of the Gene Ontology, which is widely used as a benchmark for functional prediction. With an independent dataset including very difficult cases the recall measure reached a reasonable performance for the first 50 ranked predictions, on average; however, average precision was quite low.
- Published
- 2011
- Full Text
- View/download PDF
36. Question answering for biology and medicine
- Author
-
Anne-Lise Veuthey, Julien Gobeill, Douglas Theodoro, Patrick Ruch, Christian Lovis, and E. Patsche
- Subjects
Information extraction ,Information retrieval ,Computer science ,Controlled vocabulary ,Question answering ,MEDLINE ,Biological database ,Subject (documents) ,UniProt ,computer.software_genre ,DrugBank ,computer - Abstract
Biomedical professionals have at their disposal a huge amount of data, such as literature, i.e. textual contents, or databases, i.e. structured contents. But when they have a question, they often have to deal with too many documents in order to efficiently find the appropriate answer in a reasonable time. We have developed a Question Answering system which aims to analyze the user's question, to retrieve the most relevant documents from MEDLINE, and to extract from these retrieved documents a list of candidate answers, ranked by confidence. These candidate answers are concepts issued from biomedical controlled vocabularies, such as the Medical Subject Headings (MeSH) for a first step, and are extracted from the most relevant documents with pattern matching strategies. For evaluation purposes, we apply the system on two biological databases, UniProt and DrugBank. From these resources, we generated two large benchmarks of 200 questions dealing respectively with diseases and proteins, and with diseases and drugs. For these 2 sets, the first candidate answer proposed by our system is respectively correct in 57% and in 68%, while respectively 70% and 75% of all answers to find are contained in the ten first proposed candidate answers. Despite the use of simple Information Extraction strategies, our system exploits the redundancy of information in literature in order to provide a powerful Question Answering system.
- Published
- 2009
- Full Text
- View/download PDF
37. Text mining for Swiss-Prot curation: A story of success and failure
- Author
-
Anne-Lise Veuthey, Violaine Pillet, Yum Lina Yip, and Patrick Ruch
- Subjects
General Materials Science - Published
- 2009
- Full Text
- View/download PDF
38. Text mining for Swiss-Prot curation: A story of success and failure
- Author
-
Patrick Ruch, Violaine Pillet, Anne-Lise Veuthey, and Yum Lina Yip
- Subjects
World Wide Web ,Text mining ,Bioinformatics ,business.industry ,Computer science ,General Materials Science ,UniProt ,business ,Data science - Abstract
A text mining group has been set up at the Swiss Institute of Bioinformatics, with objective to develop and adapt information retrieval and extraction tools to help Swiss-Prot curators in their daily annotation work. After over 7 year activities, this group has gathered a significant amount of experience about the need in text mining for biocuration. The first observation we made is that there is no “in-a-box” solution which can satisfy every needs. Each curator has his/her own strategy to find information from the literature and none of the existing information retrieval systems is able to compete with it, more for reason of habits than for reason of performance. Second observation: to be completely operative, an information retrieval system should be embedded in the annotation platform. For instance, it should be possible to copy/paste information, such as the article reference or some interesting sentences, directly in the database format. Most of the existing online programs are hardly adaptable for this task and their use usually results in additional editing efforts for the curators. From this observation, we can derive the fact that integrating text mining services is usually more costly than expected since wrappers and user interfaces need significant developments sometimes fairly user-specific. After noticing these problems in the design and use of a generic information retrieval system for the Swiss-Prot curators, we focused our effort on text mining applications for database update. The follow-up of the literature is essential in the process of database maintenance and there are needs for automatic information extraction tools on a large panel of topics. We developed several IE applications in the field of: - PTM information (phosphorylation, glycosylation, disulfide bridge) - Subcellular localization - Variant/mutation detection and characterization - New sequence with enzymatic activities - New characterization of enzymes. These tools are integrated into pipelines which follow PubMed daily outcomes and generate list of selected abstracts with highlights on the relevant sentences. These procedures are done independently of the usual annotation workflow, so that curators can mine these preselected data whenever they work on database entry updates. To conclude, we have identified big challenges in text mining services after discussion with the curators. One of them is the detection of novel information, especially those related to a new function or a new characterization of a protein or one of its close homologues. We are currently working on this task in the framework of the collaborative project “EAGL”. Another challenge is definitely the large-scale screening of newly published full-text papers to complement the often incomplete information in abstracts. This becomes more and more indispensable, not really for the annotation of widely studied “hot” proteins, but to find new data on uncharacterized ones. For instance, when no gene name has been attributed to a sequence, the only way to retrieve information is to use the orf names, which are never provided in abstracts. Finally, one should definitely stress that many of these information retrieval and extraction tasks could be greatly simplified with the requirement of metadata at the article submission time, such as an official HGNC gene name or a UniProt reference.
- Published
- 2009
- Full Text
- View/download PDF
39. Using discourse analysis to improve text categorization in MEDLINE
- Author
-
Patrick, Ruch, Antoine, Geissbühler, Julien, Gobeill, Frederic, Lisacek, Imad, Tbahriti, Anne-Lise, Veuthey, and Alan R, Aronson
- Subjects
Medical Subject Headings ,Abstracting and Indexing ,MEDLINE ,Libraries, Digital ,Information Storage and Retrieval ,Natural Language Processing - Abstract
Automatic keyword assignment has been largely studied in medical informatics in the context of the MEDLINE database, both for helping search in MEDLINE and in order to provide an indicative "gist" of the content of an article. Automatic assignment of Medical Subject Headings (MeSH), which is formally an automatic text categorization task, has been proposed using different methods or combination of methods, including machine learning (naïve Bayes, neural networks..), linguistically-motivated methods (syntactic parsing, semantic tagging, or information retrieval.In the present study, we propose to evaluate the impact of the argumentative structures of scientific articles to improve the categorization effectiveness of a categorizer, which combines linguistically-motivated and information retrieval methods. Our argumentative categorizer, which uses representation levels inherited from the field of discourse analysis, is able to classify sentences of an abstract in four classes: PURPOSE; METHODS; RESULTS and CONCLUSION. For the evaluation, the OHSUMED collection, a sample of MEDLINE, is used as a benchmark. For each abstract in the collection, the result of the argumentative classifier, i.e. the labeling of each sentence with an argumentative class, is used to modify the original ranking of the MeSH categorizer.The most effective combination (+2%, p0.003) strongly overweights the METHODS section and moderately the RESULTS and CONCLUSION section.Although modest, the improvement brought by argumentative features for text categorization confirms that discourse analysis methods could benefit text mining in scientific digital libraries.
- Published
- 2007
40. Retrieving mutation-specific information for human proteins in UniProt/Swiss-Prot Knowledgebase
- Author
-
Violaine Pillet, Anne-Lise Veuthey, Nathalie Lachenal, and Yum Lina Yip
- Subjects
Information retrieval ,Polymorphism, Genetic ,Computer science ,Single amino acid polymorphism ,Specific-information ,Knowledge Bases ,Computational Biology ,Proteins ,Biochemistry ,Computer Science Applications ,Amino Acid Substitution ,Terminology as Topic ,Mutation (genetic algorithm) ,Mutation ,Humans ,Regular expression ,Single amino acid ,UniProt ,Databases, Protein ,Molecular Biology ,Human proteins ,Software - Abstract
The UniProt/Swiss-Prot Knowledgebase records about 30,500 variants in 5,664 proteins (Release 52.2). Most of these variants are manually curated single amino acid polymorphisms (SAPs) with references to the literature. In order to keep the list of published documents related to SAPs up to date, an automatic information retrieval method is developed to recover texts mentioning SAPs. The method is based on the use of regular expressions (patterns) and rules for the detection and validation of mutations. When evaluated using a corpus of 9,820 PubMed references, the precision of the retrieval was determined to be 89.5% over all variants. It was also found that the use of nonstandard mutation nomenclature and sequence positional correction is necessary to retrieve a significant number of relevant articles. The method was applied to the 5,664 proteins with variants. This was performed by first submitting a PubMed query to retrieve articles using gene or protein names and a list of mutation-related keywords; the SAP detection procedure was then used to recover relevant documents. The method was found to be efficient in retrieving new references on known polymorphisms. New references on known SAPs will be rendered accessible to the public via the Swiss-Prot variant pages.
- Published
- 2007
41. Extracting key sentences with latent argumentative structuring
- Author
-
Patrick, Ruch, Robert, Baud, Christine, Chichester, Antoine, Geissbühler, Frédérique, Lisacek, Johann, Marty, Dietrich, Rebholz-Schuhmann, Imad, Tbahriti, and Anne-Lise, Veuthey
- Subjects
MEDLINE ,Humans ,Bayes Theorem ,Natural Language Processing - Abstract
Key word assignment has been largely used in MEDLINE to provide an indicative "gist" of the content of articles. Abstracts are also used for this purpose. However with usually more than 300 words, abstracts can still be regarded as long documents; therefore we design a system to select a unique key sentence. This key sentence must be indicative of the article's content and we assume that abstract's conclusions are good candidates. We design and assess the performance of an automatic key sentence selector, which classifies sentences into 4 argumentative moves: PURPOSE, METHODS, RESULTS and CONCLUSION.We rely on Bayesian classifiers trained on automatically acquired data. Features representation, selection and weighting are reported and classification effectiveness is evaluated on the four classes using confusion matrices. We also explore the use of simple heuristics to take the position of sentences into account. Recall, precision and F-scores are computed for the CONCLUSION class. For the CONCLUSION class, the F-score reaches 84%. Automatic argumentative classification is feasible on MEDLINE abstracts and should help user navigation in such repositories.
- Published
- 2005
42. Assisting medical annotation in Swiss-Prot using statistical classifiers
- Author
-
Eric Gaussier, Pavel B. Dobrokhotov, Anne-Lise Veuthey, and Cyril Goutte
- Subjects
Information retrieval ,Polymorphism, Genetic ,Computer science ,Statistics as Topic ,MEDLINE ,Genetic Diseases, Inborn ,Health Informatics ,Workload ,Annotation ,Research community ,Document filtering ,Humans ,UniProt ,Databases, Protein ,Classifier (UML) - Abstract
Bio-medical knowledge bases are valuable resources for the research community. Original scientific publications are the main source used to annotate them. Medical annotation in Swiss-Prot is specifically targeted at finding and extracting data about human genetic diseases and polymorphisms. Curators have to scan through hundreds of publications to select the relevant ones. This workload can be greatly reduced by using bio-text mining techniques. Using a combination of natural language processing (NLP) techniques and statistical classifiers, we achieve recall points of up to 84% on the potentially interesting documents and a precision of more than 96% in detecting irrelevant documents. Careful analysis of the document pre-processing chain allows us to measure the impact of some steps on the overall result, as well as test different classifier configurations. The best combination was used to create a prototype of a search and classification tool that is currently tested by the database curators.
- Published
- 2004
43. A probabilistic information retrieval approach to medical annotation in SWISS-PROT
- Author
-
Pavel B, Dobrokhotov, Cyril, Goutte, Anne-Lise, Veuthey, and Eric, Gaussier
- Subjects
Information Storage and Retrieval ,Databases, Protein ,Switzerland ,Probability - Abstract
The goal of medical annotation of human proteins in Swiss-Prot is to add features specifically intended for researchers working on genetic diseases and polymorphisms. For this purpose, it is necessary to search through a vast number of publications containing relevant information. Promising results have been obtained by applying natural language processing and machine learning techniques to solve this problem. By using the Probabilistic Latent Categorizer on representative query sets, 69% recall and 59% precision was achieved for relevant documents. This classifier also rejected irrelevant abstracts with more than 96% precision. Better linguistic pre-processing of source documents can further improve such computer approach.
- Published
- 2003
44. Automated annotation of microbial proteomes in SWISS-PROT
- Author
-
Elisabeth Coudert, Tania Lima, Paul J. Kersey, Alexandre Gattiker, Karine Michoud, Andrea H. Auchincloss, Christian J. A. Sigrist, Corinne Lachaize, Elisabeth Gasteiger, Marco Pagni, Catherine Rivoire, Amos Marc Bairoch, and Anne-Lise Veuthey
- Subjects
Proteome ,Molecular Sequence Data ,Vertebrate and Genome Annotation Project ,Biology ,Biochemistry ,Manual curation ,World Wide Web ,Annotation ,Bacterial Proteins ,Structural Biology ,Amino Acid Sequence ,ddc:576 ,Databases, Protein ,GeneralLiterature_REFERENCE(e.g.,dictionaries,encyclopedias,glossaries) ,Proteome/classification/physiology ,Information retrieval ,Sequence database ,business.industry ,Database Management Systems/standards/trends ,Organic Chemistry ,Bacterial Proteins/classification/physiology ,Automation ,Computational Mathematics ,Manual annotation ,Database Management Systems ,UniProt ,business ,Databases, Protein/classification/standards ,Genome, Bacterial - Abstract
Large-scale sequencing of prokaryotic genomes demands the automation of certain annotation tasks currently manually performed in the production of the SWISS-PROT protein knowledgebase. The HAMAP project, or 'High-quality Automated and Manual Annotation of microbial Proteomes', aims to integrate manual and automatic annotation methods in order to enhance the speed of the curation process while preserving the quality of the database annotation. Automatic annotation is only applied to entries that belong to manually defined orthologous families and to entries with no identifiable similarities (ORFans). Many checks are enforced in order to prevent the propagation of wrong annotation and to spot problematic cases, which are channelled to manual curation. The results of this annotation are integrated in SWISS-PROT, and a website is provided at http://www.expasy.org/sprot/hamap/.
- Published
- 2003
45. A novel hepatitis C virus (HCV) subtype from Somalia and its classification into HCV clade 3
- Author
-
Antoine Hadengue, Rafael Quadri, Karim Abid, Francesco Negro, and Anne-Lise Veuthey
- Subjects
Adult ,Male ,Genotype ,Hepatitis C virus ,Somalia ,Molecular Sequence Data ,Hepacivirus ,Biology ,Viral Nonstructural Proteins ,medicine.disease_cause ,Chronic hepatitis ,Viral Envelope Proteins ,Interferon ,Hepatitis C, Chronic/virology ,Virology ,Sequence Homology, Nucleic Acid ,medicine ,Humans ,Clade ,Phylogeny ,ddc:616 ,Molecular epidemiology ,Hepatology ,Phylogenetic tree ,Base Sequence ,Viral Core Proteins ,virus diseases ,Hepatitis C ,Sequence Analysis, DNA ,Hepatitis C, Chronic ,Middle Aged ,medicine.disease ,digestive system diseases ,Hepacivirus/classification/genetics ,Viral Nonstructural Proteins/genetics ,DNA, Viral ,Female ,Hepacivirus/classification ,Hepacivirus/genetics ,Viral Core Proteins/genetics ,Viral Envelope Proteins/genetics ,medicine.drug - Abstract
Hepatitis C virus (HCV) sequences from throughout the world have been grouped into six clades, based on recently proposed criteria. Here, the partial sequences and clade assignment are reported for three HCV isolates from chronic hepatitis C patients from Somalia, for whom conventional assays failed to identify the genotype. Phylogenetic analysis of the sequences of the core, envelope 1 and part of the non- structural 5b regions suggests that all three isolates belong to a distinct HCV genetic group, tentatively classified as subtype 3h. This novel HCV subtype shows the highest sequence similarity with HCV isolates from Indonesia. Despite the fact that these patients were infected with HCV clade 3, none of them responded to standard interferon treatment.
- Published
- 2000
46. Phylogenetic relationships of fungi, plantae, and animalia inferred from homologous comparison of ribosomal proteins
- Author
-
Anne-Lise Veuthey and Gabriel Bittar
- Subjects
Ribosomal Proteins ,Sequence alignment ,Biology ,Animal Population Groups ,Evolution, Molecular ,Phylogenetics ,Ribosomal protein ,Sequence Homology, Nucleic Acid ,Genetics ,Homologous chromosome ,Animals ,Computer Simulation ,Clade ,Molecular Biology ,Ecology, Evolution, Behavior and Systematics ,Phylogeny ,Phylogenetic tree ,Models, Genetic ,Sequence Homology, Amino Acid ,Fungi ,Plants ,Evolutionary biology ,Outgroup ,Trichotomy (philosophy) ,Sequence Alignment - Abstract
The complete set of available ribosomal proteins was utilized, at both the peptidic and the nucleotidic level, to establish that plants and metazoans form two sister clades relative to fungi. Different phylogenetic inference methods are applied to the sequence data, using archeans as the outgroup. The evolutionary length of the internal branch within the eukaryotic crown trichotomy is demonstrated to be, at most, one-tenth of the evolutionary length of the branch leading to the cenancester of these three kingdoms.
- Published
- 1998
47. Answering Gene Ontology terms to proteomics questions by supervised macro reading in Medline
- Author
-
Julien Gobeill, Christian Lovis, Anne-Lise Veuthey, Douglas Teodoro, Emilie Pasche, and Patrick Ruch
- Subjects
Information retrieval ,Result set ,Computer science ,business.industry ,media_common.quotation_subject ,Semantic search ,Ontology (information science) ,Semantics ,Reading (process) ,Controlled vocabulary ,Question answering ,business ,Natural language ,media_common - Abstract
Motivation and Objectives Biomedical professionals have at their disposal a huge amount of literature. But when they have a precise question, they often have to deal with too many documents to efficiently find the appropriate answers in a reasonable time. Faced to this literature overload, the need for automatic assistance has been largely pointed out, and PubMed is argued to be only the beginning on how scientists use the biomedical literature (Hunter and Cohen, 2006). Ontology-based search engines began to introduce semantics in search results. These systems still display documents, but the user visualizes clusters of PubMed results according to concepts which were extracted from the abstracts. GoPubMed (Doms and Schroeder, 2005) and EBIMed (Rebholz-Schuhmann et al, 2007) are popular examples of such ontology-based search engines in the biomedical domain. Question Answering (QA) systems are argued to be the next generation of semantic search engines (Wren, 2011). QA systems no more display documents but directly concepts which were extracted from the search results; these concepts are supposed to answer the user’s question formulated in natural language. EAGLi (Gobeill et al, 2009), our locally developed system, is an example of such QA search engines. Thus, both ontology-based and QA search engines, share the crucial task of efficiently extracting concepts from the result set, i.e. a set of documents. This task is sometimes called macro reading, in contrast with micro reading – or classification, categorization – which is a traditional Natural Language Processing task that aims at extracting concepts from a single document (Mitchell et al, 2009). This paper focuses on macro reading of MEDLINE abstracts. Several experiments have been reported to find the best way to extract ontology terms out of a single MEDLINE abstract, i.e. micro reading. In particular, (Trieschnigg et al, 2009) compared the performances of six classification systems for reproducing the manual Medical Subject Headings (MeSH) annotation of a MEDLINE abstract. The evaluated systems included two morphosyntactic classifiers (sometimes also called thesaurus-based), which aim at literally finding ontology terms in the abstract by alignment of words, and a machine learning (or supervised) classifier, which aims at inferring the annotation from a knowledge base containing already annotated abstracts. The authors concluded that the machine learning approach outperformed the morphosyntactic ones. But the macro reading task is fundamentally different, as we look for the best way to extract then combine ontology terms from a set of MEDLINE abstracts. The issue investigated in this paper is: to what extent the differences observed between two classifiers for a micro reading task are observed for a macro reading one? In particular, the redundancy hypothesis claims that the redundancy in large textual collections such as the Web or MEDLINE tends to smoothe performance differences across classifiers (Lin, 2007). To address this question, we compared a morphosyntactic and a machine learning classifiers for both tasks, focusing on the extraction of Gene Ontology (GO) terms, a controlled vocabulary for the characterization of proteins functions. The micro reading task consisted in extracting GO terms from a single MEDLINE abstract, as in the Trieschnigg et al’s work; the macro reading task consisted in extracting GO terms from a set of MEDLINE abstracts in order to answer to proteomics questions asked to the EAGLi QA system.
- Published
- 2012
- Full Text
- View/download PDF
48. N-Terminal myristoylation predictions by ensembles of neural networks.
- Author
-
Guido Bologna, Cédric Yvon, Séverine Duvaud, and Anne-Lise Veuthey
- Published
- 2004
- Full Text
- View/download PDF
49. Annotation of post-translational modifications in the Swiss-Prot knowledge base.
- Author
-
Nathalie Farriol-Mathis, John S. Garavelli, Brigitte Boeckmann, Séverine Duvaud, Elisabeth Gasteiger, Alain Gateau, Anne-Lise Veuthey, and Amos Bairoch
- Published
- 2004
- Full Text
- View/download PDF
50. The adenylate kinase reaction acts as a frequency filter towards fluctuations of ATP utilization in the cell
- Author
-
Jörg W. Stucki and Anne-Lise Veuthey
- Subjects
Adenine Nucleotides ,Differential equation ,Chemistry ,Adenylate Kinase ,Phosphotransferases ,Organic Chemistry ,Biophysics ,Adenylate kinase ,Conductance ,Thermodynamics ,Rate equation ,Biochemistry ,Oxidative Phosphorylation ,Quantitative Biology::Subcellular Processes ,Kinetics ,Stochastic differential equation ,Adenosine Triphosphate ,Diffusion process ,Adenine nucleotide ,Frequency domain ,Animals - Abstract
The buffering ability of the adenylate kinase reaction with respect to the phosphate potential and the efficiency of oxidative phosphorylation in the presence of a fluctuating load conductance were studied by computer simulations. Fluctuations of the load conductance, i.e. of the irreversible ATP-utilizing reactions in the cell, were generated by integrating an Ornstein-Uhlenbeck diffusion process. This real or colored noise was then injected into the set of differential equations describing the rate laws for the changes of the adenine nucleotide concentrations based on a simple nonequilibrium thermodynamic model of oxidative phosphorylation. Numerical integration of this system of stochastic differential equations allowed us to investigate the influence of different parameters on the performance of this energy converter. Probability density estimates revealed that the variance of the efficiency about its optimal value was significantly reduced by the adenylate kinase reaction. It was found that the buffering ability of this enzyme is restricted to a specific frequency domain of the fluctuations of the load conductance. This frequency filtering was confirmed by substituting the random fluctuations of the load conductance by simple sinusoidal perturbations. All these studies revealed that for each domain of frequencies of the load perturbations there exists an optimal activity of the adenylate kinase which minimizes deviations from optimal efficiency of oxidative phosphorylation.
- Published
- 1987
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.