25 results on '"Contrino, S."'
Search Results
2. Conocimientos en nutrición en una muestra de cuidadores (profesionales y no profesionales) de un centro sociosanitario
- Author
-
Iglesias, L., primary, Steegman, L., additional, Leon, R., additional, Contrino, S., additional, Soto, S., additional, Santiago, R., additional, and Bermejo, J.C., additional
- Published
- 2018
- Full Text
- View/download PDF
3. Making linked data SPARQL with the InterMine biological data warehouse
- Author
-
Déraspe, M., Binkley, G., Butano, D., Chadwick, M., Cherry, J.M., Clark-Casey, J., Contrino, S., Corbeil, Jacques, Heimbach, J., Karra, K., Lyne, R., Sullivan, J., Yehudi, Y., Micklem, G., Dumontier, M., Déraspe, M., Binkley, G., Butano, D., Chadwick, M., Cherry, J.M., Clark-Casey, J., Contrino, S., Corbeil, Jacques, Heimbach, J., Karra, K., Lyne, R., Sullivan, J., Yehudi, Y., Micklem, G., and Dumontier, M.
- Abstract
InterMine is a system for integrating, analysing, and republishing biological data from multiple sources. It provides access to these data via a web user interface and programmatic web services. However, the precise invocation of services and subsequent exploration of returned data require substantial expertise on the structure of the underlying database. Here, we describe an approach that uses Semantic Web technologies to make InterMine data more broadly accessible and reusable, in accordance with the FAIR principles. We describe a pipeline to extract, transform, and load a Linked Data representation of the InterMine store. We use Docker to bring together SPARQL-aware applications to search, browse, explore, and query the InterMine-based data. Our work therefore extends interoperability of the InterMine platform, and supports new query functionality across InterMine installations and the network of open Linked Data.
- Published
- 2016
4. Integrative Analysis of the Caenorhabditis elegans Genome by the modENCODE Project
- Author
-
Auerbach, R. K., Clawson, H., Niu, W., Van Nostrand, E. L., Morris, M., Brennan, J., Lu, Z. J., Gerstein, M. B., Cheung, M.-S., Alexander, R. P., Arshinoff, B. I., Cheng, C., Barber, G., Robilotto, R., Perry, M., Chateigner, A., Dernburg, A. F., Ikegami, K., Brdlik, C. M., Rechtsteiner, A., Alves, P., Brouillet, J. J., Contrino, S., Agarwal, A., Vielle, A., Carr, A., Leng, J., Dannenberg, L. O., Liu, T., Rhrissorrakrai, K., Yip, K. Y., and Feng, X.
- Abstract
We systematically generated large-scale data sets to improve genome annotation for the nematode Caenorhabditis elegans, a key model organism. These data sets include transcriptome profiling across a developmental time course, genome-wide identification of transcription factor–binding sites, and maps of chromatin organization. From this, we created more complete and accurate gene models, including alternative splice forms and candidate noncoding RNAs. We constructed hierarchical networks of transcription factor–binding and microRNA interactions and discovered chromosomal locations bound by an unusually large number of transcription factors. Different patterns of chromatin composition and histone modification were revealed between chromosome arms and centers, with similarly prominent differences between autosomes and the X chromosome. Integrating data types, we built statistical models relating chromatin, transcription factor binding, and gene expression. Overall, our analyses ascribed putative functions to most of the conserved genome.
- Published
- 2010
- Full Text
- View/download PDF
5. modMine: flexible access to modENCODE data
- Author
-
Contrino, S., primary, Smith, R. N., additional, Butano, D., additional, Carr, A., additional, Hu, F., additional, Lyne, R., additional, Rutherford, K., additional, Kalderimis, A., additional, Sullivan, J., additional, Carbon, S., additional, Kephart, E. T., additional, Lloyd, P., additional, Stinson, E. O., additional, Washington, N. L., additional, Perry, M. D., additional, Ruzanov, P., additional, Zha, Z., additional, Lewis, S. E., additional, Stein, L. D., additional, and Micklem, G., additional
- Published
- 2011
- Full Text
- View/download PDF
6. The ArrayExpress gene expression database: a software engineering and implementation perspective
- Author
-
Sarkans, U., primary, Parkinson, H., additional, Lara, G. G., additional, Oezcimen, A., additional, Sharma, A., additional, Abeygunawardena, N., additional, Contrino, S., additional, Holloway, E., additional, Rocca-Serra, P., additional, Mukherjee, G., additional, Shojatalab, M., additional, Kapushesky, M., additional, Sansone, S.-A., additional, Farne, A., additional, Rayner, T., additional, and Brazma, A., additional
- Published
- 2004
- Full Text
- View/download PDF
7. Super ESCA: First beamline operating at ELETTRA
- Author
-
Abrami, A., primary, Barnaba, M., additional, Battistello, L., additional, Bianco, A., additional, Brena, B., additional, Cautero, G., additional, Chen, Q. H., additional, Cocco, D., additional, Comelli, G., additional, Contrino, S., additional, DeBona, F., additional, Di Fonzo, S., additional, Fava, C., additional, Finetti, P., additional, Furlan, P., additional, Galimberti, A., additional, Gambitta, A., additional, Giuressi, D., additional, Godnig, R., additional, Jark, W., additional, Lizzit, S., additional, Mazzolini, F., additional, Melpignano, P., additional, Olivi, L., additional, Paolucci, G., additional, Pugliese, R., additional, Qian, S. N., additional, Rosei, R., additional, Sandrin, G., additional, Savoia, A., additional, Sergo, R., additional, Sostero, G., additional, Tommasini, R., additional, Tudor, M., additional, Vivoda, D., additional, Wei, F.‐Q., additional, and Zanini, F., additional
- Published
- 1995
- Full Text
- View/download PDF
8. The role SWISS-PROT and TrEMBL play in the genome research environment
- Author
-
Junker, V., Contrino, S., Fleischmann, W., Hermjakob, H., Lang, F., Magrane, M., Martin, M. Jesus, Mitaritonna, N., O`Donovan, C., and Apweiler, R.
- Published
- 2000
- Full Text
- View/download PDF
9. Protein sequence annotation in the genome era: the annotation concept of SWISS-PROT+TREMBL
- Author
-
Apweiler R, Gateau A, Contrino S, Mj, Martin, Junker V, Claire O'Donovan, Lang F, Mitaritonna N, Kappus S, and Bairoch A
- Subjects
Genome ,Databases, Factual ,Software Design ,Proteins/genetics ,Proteins ,Amino Acid Sequence ,ddc:576 ,Software - Abstract
SWISS-PROT is a curated protein sequence database which strives to provide a high level of annotation, a minimal level of redundancy and high level of integration with other databases. Ongoing genome sequencing projects have dramatically increased the number of protein sequences to be incorporated into SWISS-PROT. Since we do not want to dilute the quality standards of SWISS-PROT by incorporating sequences without proper sequence analysis and annotation, we cannot speed up the incorporation of new incoming data indefinitely. However, as we also want to make the sequences available as fast as possible, we introduced TREMBL (TRanslation of EMBL nucleotide sequence database), a supplement to SWISS-PROT. TREMBL consists of computer-annotated entries in SWISS-PROT format derived from the translation of all coding sequences (CDS) in the EMBL nucleotide sequence database, except for CDS already included in SWISS-PROT. While TREMBL is already of immense value, its computer-generated annotation does not match the quality of SWISS-PROTs. The main difference is in the protein functional information attached to sequences. With this in mind, we are dedicating substantial effort to develop and apply computer methods to enhance the functional information attached to TREMBL entries.
10. Sharing sensitive data in life sciences: an overview of centralized and federated approaches.
- Author
-
Rujano MA, Boiten JW, Ohmann C, Canham S, Contrino S, David R, Ewbank J, Filippone C, Connellan C, Custers I, van Nuland R, Mayrhofer MT, Holub P, Álvarez EG, Bacry E, Hughes N, Freeberg MA, Schaffhauser B, Wagener H, Sánchez-Pla A, Bertolini G, and Panagiotopoulou M
- Subjects
- Humans, Medical Informatics methods, Information Dissemination, Biological Science Disciplines
- Abstract
Biomedical data are generated and collected from various sources, including medical imaging, laboratory tests and genome sequencing. Sharing these data for research can help address unmet health needs, contribute to scientific breakthroughs, accelerate the development of more effective treatments and inform public health policy. Due to the potential sensitivity of such data, however, privacy concerns have led to policies that restrict data sharing. In addition, sharing sensitive data requires a secure and robust infrastructure with appropriate storage solutions. Here, we examine and compare the centralized and federated data sharing models through the prism of five large-scale and real-world use cases of strategic significance within the European data sharing landscape: the French Health Data Hub, the BBMRI-ERIC Colorectal Cancer Cohort, the federated European Genome-phenome Archive, the Observational Medical Outcomes Partnership/OHDSI network and the EBRAINS Medical Informatics Platform. Our analysis indicates that centralized models facilitate data linkage, harmonization and interoperability, while federated models facilitate scaling up and legal compliance, as the data typically reside on the data generator's premises, allowing for better control of how data are shared. This comparative study thus offers guidance on the selection of the most appropriate sharing strategy for sensitive datasets and provides key insights for informed decision-making in data sharing efforts., (© The Author(s) 2024. Published by Oxford University Press.)
- Published
- 2024
- Full Text
- View/download PDF
11. HumanMine: advanced data searching, analysis and cross-species comparison.
- Author
-
Lyne R, Bazaga A, Butano D, Contrino S, Heimbach J, Hu F, Kalderimis A, Lyne M, Reierskog K, Stepan R, Sullivan J, Wise A, Yehudi Y, and Micklem G
- Subjects
- Databases, Factual, Humans, Proteomics, Genome, Human, Information Storage and Retrieval
- Abstract
HumanMine (www.humanmine.org) is an integrated database of human genomics and proteomics data that provides a powerful interface to support sophisticated exploration and analysis of data compiled from experimental, computational and curated data sources. Built using the InterMine data integration platform, HumanMine includes genes, proteins, pathways, expression levels, Single nucleotide polymorphism (SNP), diseases and more, integrated into a single searchable database. HumanMine promotes integrative analysis, a powerful approach in modern biology that allows many sources of evidence to be analysed together. The data can be accessed through a user-friendly web interface as well as a powerful, scriptable web service Application programming interface (API) to allow programmatic access to data. The web interface includes a useful identifier resolution system, sophisticated query options and interactive results tables that enable powerful exploration of data, including data summaries, filtering, browsing and export. A set of graphical analysis tools provide a rich environment for data exploration including statistical enrichment of sets of genes or other biological entities. HumanMine can be used for integrative multistaged analysis that can lead to new insights and uncover previously unknown relationships. Database URL: https://www.humanmine.org., (© The Author(s) 2022. Published by Oxford University Press.)
- Published
- 2022
- Full Text
- View/download PDF
12. ThaleMine: A Warehouse for Arabidopsis Data Integration and Discovery.
- Author
-
Krishnakumar V, Contrino S, Cheng CY, Belyaeva I, Ferlanti ES, Miller JR, Vaughn MW, Micklem G, Town CD, and Chan AP
- Subjects
- Arabidopsis Proteins metabolism, Computational Biology methods, Gene Ontology, Genomics methods, Information Storage and Retrieval methods, Internet, Protein Interaction Mapping methods, Protein Interaction Maps genetics, Reproducibility of Results, Sequence Analysis, RNA, Arabidopsis genetics, Arabidopsis Proteins genetics, Databases, Genetic, Gene Expression Profiling, Gene Expression Regulation, Plant genetics
- Abstract
ThaleMine (https://apps.araport.org/thalemine/) is a comprehensive data warehouse that integrates a wide array of genomic information of the model plant Arabidopsis thaliana. The data collection currently includes the latest structural and functional annotation from the Araport11 update, the Col-0 genome sequence, RNA-seq and array expression, co-expression, protein interactions, homologs, pathways, publications, alleles, germplasm and phenotypes. The data are collected from a wide variety of public resources. Users can browse gene-specific data through Gene Report pages, identify and create gene lists based on experiments or indexed keywords, and run GO enrichment analysis to investigate the biological significance of selected gene sets. Developed by the Arabidopsis Information Portal project (Araport, https://www.araport.org/), ThaleMine uses the InterMine software framework, which builds well-structured data, and provides powerful data query and analysis functionality. The warehoused data can be accessed by users via graphical interfaces, as well as programmatically via web-services. Here we describe recent developments in ThaleMine including new features and extensions, and discuss future improvements. InterMine has been broadly adopted by the model organism research community including nematode, rat, mouse, zebrafish, budding yeast, the modENCODE project, as well as being used for human data. ThaleMine is the first InterMine developed for a plant model. As additional new plant InterMines are developed by the legume and other plant research communities, the potential of cross-organism integrative data analysis will be further enabled., (© The Author 2016. Published by Oxford University Press on behalf of Japanese Society of Plant Physiologists. All rights reserved. For permissions, please email: journals.permissions@oup.com.)
- Published
- 2017
- Full Text
- View/download PDF
13. Cross-organism analysis using InterMine.
- Author
-
Lyne R, Sullivan J, Butano D, Contrino S, Heimbach J, Hu F, Kalderimis A, Lyne M, Smith RN, Štěpán R, Balakrishnan R, Binkley G, Harris T, Karra K, Moxon SA, Motenko H, Neuhauser S, Ruzicka L, Cherry M, Richardson J, Stein L, Westerfield M, Worthey E, and Micklem G
- Subjects
- Animals, Computational Biology methods, Databases, Genetic, Genomics, Humans, Internet, Systems Integration, User-Computer Interface, Databases, Factual, Software
- Abstract
InterMine is a data integration warehouse and analysis software system developed for large and complex biological data sets. Designed for integrative analysis, it can be accessed through a user-friendly web interface. For bioinformaticians, extensive web services as well as programming interfaces for most common scripting languages support access to all features. The web interface includes a useful identifier look-up system, and both simple and sophisticated search options. Interactive results tables enable exploration, and data can be filtered, summarized, and browsed. A set of graphical analysis tools provide a rich environment for data exploration including statistical enrichment of sets of genes or other entities. InterMine databases have been developed for the major model organisms, budding yeast, nematode worm, fruit fly, zebrafish, mouse, and rat together with a newly developed human database. Here, we describe how this has facilitated interoperation and development of cross-organism analysis tools and reports. InterMine as a data exploration and analysis tool is also described. All the InterMine-based systems described in this article are resources freely available to the scientific community., (© 2015 Wiley Periodicals, Inc.)
- Published
- 2015
- Full Text
- View/download PDF
14. Araport: the Arabidopsis information portal.
- Author
-
Krishnakumar V, Hanlon MR, Contrino S, Ferlanti ES, Karamycheva S, Kim M, Rosen BD, Cheng CY, Moreira W, Mock SA, Stubbs J, Sullivan JM, Krampis K, Miller JR, Micklem G, Vaughn M, and Town CD
- Subjects
- Data Mining, Internet, Software, Arabidopsis genetics, Databases, Genetic, Genome, Plant
- Abstract
The Arabidopsis Information Portal (https://www.araport.org) is a new online resource for plant biology research. It houses the Arabidopsis thaliana genome sequence and associated annotation. It was conceived as a framework that allows the research community to develop and release 'modules' that integrate, analyze and visualize Arabidopsis data that may reside at remote sites. The current implementation provides an indexed database of core genomic information. These data are made available through feature-rich web applications that provide search, data mining, and genome browser functionality, and also by bulk download and web services. Araport uses software from the InterMine and JBrowse projects to expose curated data from TAIR, GO, BAR, EBI, UniProt, PubMed and EPIC CoGe. The site also hosts 'science apps,' developed as prototypes for community modules that use dynamic web pages to present data obtained on-demand from third-party servers via RESTful web services. Designed for sustainability, the Arabidopsis Information Portal strategy exploits existing scientific computing infrastructure, adopts a practical mixture of data integration technologies and encourages collaborative enhancement of the resource by its user community., (© The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.)
- Published
- 2015
- Full Text
- View/download PDF
15. InterMine: extensive web services for modern biology.
- Author
-
Kalderimis A, Lyne R, Butano D, Contrino S, Lyne M, Heimbach J, Hu F, Smith R, Stěpán R, Sullivan J, and Micklem G
- Subjects
- Animals, Chromosomes chemistry, Humans, Internet, Mice, Sequence Analysis, DNA, User-Computer Interface, Databases, Factual, Software
- Abstract
InterMine (www.intermine.org) is a biological data warehousing system providing extensive automatically generated and configurable RESTful web services that underpin the web interface and can be re-used in many other applications: to find and filter data; export it in a flexible and structured way; to upload, use, manipulate and analyze lists; to provide services for flexible retrieval of sequence segments, and for other statistical and analysis tools. Here we describe these features and discuss how they can be used separately or in combinations to support integrative and comparative analysis., (© The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.)
- Published
- 2014
- Full Text
- View/download PDF
16. Cloud-based uniform ChIP-Seq processing tools for modENCODE and ENCODE.
- Author
-
Trinh QM, Jen FY, Zhou Z, Chu KM, Perry MD, Kephart ET, Contrino S, Ruzanov P, and Stein LD
- Subjects
- Chromatin Immunoprecipitation, Software
- Abstract
Background: Funded by the National Institutes of Health (NIH), the aim of the Model Organism ENCyclopedia of DNA Elements (modENCODE) project is to provide the biological research community with a comprehensive encyclopedia of functional genomic elements for both model organisms C. elegans (worm) and D. melanogaster (fly). With a total size of just under 10 terabytes of data collected and released to the public, one of the challenges faced by researchers is to extract biologically meaningful knowledge from this large data set. While the basic quality control, pre-processing, and analysis of the data has already been performed by members of the modENCODE consortium, many researchers will wish to reinterpret the data set using modifications and enhancements of the original protocols, or combine modENCODE data with other data sets. Unfortunately this can be a time consuming and logistically challenging proposition., Results: In recognition of this challenge, the modENCODE DCC has released uniform computing resources for analyzing modENCODE data on Galaxy (https://github.com/modENCODE-DCC/Galaxy), on the public Amazon Cloud (http://aws.amazon.com), and on the private Bionimbus Cloud for genomic research (http://www.bionimbus.org). In particular, we have released Galaxy workflows for interpreting ChIP-seq data which use the same quality control (QC) and peak calling standards adopted by the modENCODE and ENCODE communities. For convenience of use, we have created Amazon and Bionimbus Cloud machine images containing Galaxy along with all the modENCODE data, software and other dependencies., Conclusions: Using these resources provides a framework for running consistent and reproducible analyses on modENCODE data, ultimately allowing researchers to use more of their time using modENCODE data, and less time moving it around.
- Published
- 2013
- Full Text
- View/download PDF
17. InterMine: a flexible data warehouse system for the integration and analysis of heterogeneous biological data.
- Author
-
Smith RN, Aleksic J, Butano D, Carr A, Contrino S, Hu F, Lyne M, Lyne R, Kalderimis A, Rutherford K, Stepan R, Sullivan J, Wakeling M, Watkins X, and Micklem G
- Subjects
- Algorithms, Data Mining, Genomics, Internet, Programming Languages, Computational Biology methods, Database Management Systems, Databases, Factual
- Abstract
Summary: InterMine is an open-source data warehouse system that facilitates the building of databases with complex data integration requirements and a need for a fast customizable query facility. Using InterMine, large biological databases can be created from a range of heterogeneous data sources, and the extensible data model allows for easy integration of new data types. The analysis tools include a flexible query builder, genomic region search and a library of 'widgets' performing various statistical analyses. The results can be exported in many commonly used formats. InterMine is a fully extensible framework where developers can add new tools and functionality. Additionally, there is a comprehensive set of web services, for which client libraries are provided in five commonly used programming languages., Availability: Freely available from http://www.intermine.org under the LGPL license., Contact: g.micklem@gen.cam.ac.uk, Supplementary Information: Supplementary data are available at Bioinformatics online.
- Published
- 2012
- Full Text
- View/download PDF
18. modMine: flexible access to modENCODE data.
- Author
-
Contrino S, Smith RN, Butano D, Carr A, Hu F, Lyne R, Rutherford K, Kalderimis A, Sullivan J, Carbon S, Kephart ET, Lloyd P, Stinson EO, Washington NL, Perry MD, Ruzanov P, Zha Z, Lewis SE, Stein LD, and Micklem G
- Subjects
- Animals, Gene Expression, Genome, Helminth, Genome, Insect, Genomics, Internet, User-Computer Interface, Caenorhabditis elegans genetics, Databases, Genetic, Drosophila melanogaster genetics
- Abstract
In an effort to comprehensively characterize the functional elements within the genomes of the important model organisms Drosophila melanogaster and Caenorhabditis elegans, the NHGRI model organism Encyclopaedia of DNA Elements (modENCODE) consortium has generated an enormous library of genomic data along with detailed, structured information on all aspects of the experiments. The modMine database (http://intermine.modencode.org) described here has been built by the modENCODE Data Coordination Center to allow the broader research community to (i) search for and download data sets of interest among the thousands generated by modENCODE; (ii) access the data in an integrated form together with non-modENCODE data sets; and (iii) facilitate fine-grained analysis of the above data. The sophisticated search features are possible because of the collection of extensive experimental metadata by the consortium. Interfaces are provided to allow both biologists and bioinformaticians to exploit these rich modENCODE data sets now available via modMine.
- Published
- 2012
- Full Text
- View/download PDF
19. The modENCODE Data Coordination Center: lessons in harvesting comprehensive experimental details.
- Author
-
Washington NL, Stinson EO, Perry MD, Ruzanov P, Contrino S, Smith R, Zha Z, Lyne R, Carr A, Lloyd P, Kephart E, McKay SJ, Micklem G, Stein LD, and Lewis SE
- Subjects
- Animals, Caenorhabditis elegans genetics, DNA genetics, Drosophila melanogaster genetics, Humans, Databases, Genetic, Genome, Genomics methods, Internet, Software
- Abstract
The model organism Encyclopedia of DNA Elements (modENCODE) project is a National Human Genome Research Institute (NHGRI) initiative designed to characterize the genomes of Drosophila melanogaster and Caenorhabditis elegans. A Data Coordination Center (DCC) was created to collect, store and catalog modENCODE data. An effective DCC must gather, organize and provide all primary, interpreted and analyzed data, and ensure the community is supplied with the knowledge of the experimental conditions, protocols and verification checks used to generate each primary data set. We present here the design principles of the modENCODE DCC, and describe the ramifications of collecting thorough and deep metadata for describing experiments, including the use of a wiki for capturing protocol and reagent information, and the BIR-TAB specification for linking biological samples to experimental results. modENCODE data can be found at http://www.modencode.org.
- Published
- 2011
- Full Text
- View/download PDF
20. Integrative analysis of the Caenorhabditis elegans genome by the modENCODE project.
- Author
-
Gerstein MB, Lu ZJ, Van Nostrand EL, Cheng C, Arshinoff BI, Liu T, Yip KY, Robilotto R, Rechtsteiner A, Ikegami K, Alves P, Chateigner A, Perry M, Morris M, Auerbach RK, Feng X, Leng J, Vielle A, Niu W, Rhrissorrakrai K, Agarwal A, Alexander RP, Barber G, Brdlik CM, Brennan J, Brouillet JJ, Carr A, Cheung MS, Clawson H, Contrino S, Dannenberg LO, Dernburg AF, Desai A, Dick L, Dosé AC, Du J, Egelhofer T, Ercan S, Euskirchen G, Ewing B, Feingold EA, Gassmann R, Good PJ, Green P, Gullier F, Gutwein M, Guyer MS, Habegger L, Han T, Henikoff JG, Henz SR, Hinrichs A, Holster H, Hyman T, Iniguez AL, Janette J, Jensen M, Kato M, Kent WJ, Kephart E, Khivansara V, Khurana E, Kim JK, Kolasinska-Zwierz P, Lai EC, Latorre I, Leahey A, Lewis S, Lloyd P, Lochovsky L, Lowdon RF, Lubling Y, Lyne R, MacCoss M, Mackowiak SD, Mangone M, McKay S, Mecenas D, Merrihew G, Miller DM 3rd, Muroyama A, Murray JI, Ooi SL, Pham H, Phippen T, Preston EA, Rajewsky N, Rätsch G, Rosenbaum H, Rozowsky J, Rutherford K, Ruzanov P, Sarov M, Sasidharan R, Sboner A, Scheid P, Segal E, Shin H, Shou C, Slack FJ, Slightam C, Smith R, Spencer WC, Stinson EO, Taing S, Takasaki T, Vafeados D, Voronina K, Wang G, Washington NL, Whittle CM, Wu B, Yan KK, Zeller G, Zha Z, Zhong M, Zhou X, Ahringer J, Strome S, Gunsalus KC, Micklem G, Liu XS, Reinke V, Kim SK, Hillier LW, Henikoff S, Piano F, Snyder M, Stein L, Lieb JD, and Waterston RH
- Subjects
- Animals, Caenorhabditis elegans growth & development, Caenorhabditis elegans metabolism, Caenorhabditis elegans Proteins genetics, Caenorhabditis elegans Proteins metabolism, Chromatin genetics, Chromatin metabolism, Chromatin ultrastructure, Computational Biology methods, Conserved Sequence, Evolution, Molecular, Gene Regulatory Networks, Genes, Helminth, Genomics methods, Histones metabolism, Models, Genetic, RNA, Helminth genetics, RNA, Helminth metabolism, RNA, Untranslated genetics, RNA, Untranslated metabolism, Regulatory Sequences, Nucleic Acid, Transcription Factors genetics, Transcription Factors metabolism, Caenorhabditis elegans genetics, Chromosomes genetics, Chromosomes metabolism, Chromosomes ultrastructure, Gene Expression Profiling, Gene Expression Regulation, Genome, Helminth, Molecular Sequence Annotation
- Abstract
We systematically generated large-scale data sets to improve genome annotation for the nematode Caenorhabditis elegans, a key model organism. These data sets include transcriptome profiling across a developmental time course, genome-wide identification of transcription factor-binding sites, and maps of chromatin organization. From this, we created more complete and accurate gene models, including alternative splice forms and candidate noncoding RNAs. We constructed hierarchical networks of transcription factor-binding and microRNA interactions and discovered chromosomal locations bound by an unusually large number of transcription factors. Different patterns of chromatin composition and histone modification were revealed between chromosome arms and centers, with similarly prominent differences between autosomes and the X chromosome. Integrating data types, we built statistical models relating chromatin, transcription factor binding, and gene expression. Overall, our analyses ascribed putative functions to most of the conserved genome.
- Published
- 2010
- Full Text
- View/download PDF
21. Plant-based microarray data at the European Bioinformatics Institute. Introducing AtMIAMExpress, a submission tool for Arabidopsis gene expression data to ArrayExpress.
- Author
-
Mukherjee G, Abeygunawardena N, Parkinson H, Contrino S, Durinck S, Farne A, Holloway E, Lilja P, Moreau Y, Oezcimen A, Rayner T, Sharma A, Brazma A, Sarkans U, and Shojatalab M
- Subjects
- Academies and Institutes, Computational Biology, Europe, Gene Expression Profiling, Internet, Oligonucleotide Array Sequence Analysis, Software, Triticum genetics, Arabidopsis genetics, Databases, Genetic
- Abstract
ArrayExpress is a public microarray repository founded on the Minimum Information About a Microarray Experiment (MIAME) principles that stores MIAME-compliant gene expression data. Plant-based data sets represent approximately one-quarter of the experiments in ArrayExpress. The majority are based on Arabidopsis (Arabidopsis thaliana); however, there are other data sets based on Triticum aestivum, Hordeum vulgare, and Populus subsp. AtMIAMExpress is an open-source Web-based software application for the submission of Arabidopsis-based microarray data to ArrayExpress. AtMIAMExpress exports data in MAGE-ML format for upload to any MAGE-ML-compliant application, such as J-Express and ArrayExpress. It was designed as a tool for users with minimal bioinformatics expertise, has comprehensive help and user support, and represents a simple solution to meeting the MIAME guidelines for the Arabidopsis community. Plant data are queryable both in ArrayExpress and in the Data Warehouse databases, which support queries based on gene-centric and sample-centric annotation. The AtMIAMExpress submission tool is available at http://www.ebi.ac.uk/at-miamexpress/. The software is open source and is available from http://sourceforge.net/projects/miamexpress/. For information, contact miamexpress@ebi.ac.uk.
- Published
- 2005
- Full Text
- View/download PDF
22. The ArrayExpress gene expression database: a software engineering and implementation perspective.
- Author
-
Sarkans U, Parkinson H, Lara GG, Oezcimen A, Sharma A, Abeygunawardena N, Contrino S, Holloway E, Rocca-Serra P, Mukherjee G, Shojatalab M, Kapushesky M, Sansone SA, Farne A, Rayner T, and Brazma A
- Subjects
- Algorithms, Information Dissemination methods, Database Management Systems, Databases, Genetic, Gene Expression Profiling methods, Information Storage and Retrieval methods, Oligonucleotide Array Sequence Analysis methods, Proteins genetics, Proteins metabolism, Software
- Abstract
Motivation: The lack of microarray data management systems and databases is still one of the major problems faced by many life sciences laboratories. While developing the public repository for microarray data ArrayExpress we had to find novel solutions to many non-trivial software engineering problems. Our experience will be both relevant and useful for most bioinformaticians involved in developing information systems for a wide range of high-throughput technologies., Results: ArrayExpress has been online since February 2002, growing exponentially to well over 10,000 hybridizations (as of September 2004). It has been demonstrated that our chosen design and implementation works for databases aimed at storage, access and sharing of high-throughput data., Availability: The ArrayExpress database is available at http://www.ebi.ac.uk/arrayexpress/. The software is open source., Contact: ugis@ebi.ac.uk.
- Published
- 2005
- Full Text
- View/download PDF
23. COMe: the ontology of bioinorganic proteins.
- Author
-
Degtyarenko K and Contrino S
- Subjects
- Binding Sites, Computational Biology, Internet, Ligands, Terminology as Topic, User-Computer Interface, Databases, Protein standards, Metals chemistry, Proteins chemistry, Proteins classification
- Abstract
Background: Many characterised proteins contain metal ions, small organic molecules or modified residues. In contrast, the huge amount of data generated by genome projects consists exclusively of sequences with almost no annotation. One of the goals of the structural genomics initiative is to provide representative three-dimensional (3-D) structures for as many protein/domain folds as possible to allow successful homology modelling. However, important functional features such as metal co-ordination or a type of prosthetic group are not always conserved in homologous proteins. So far, the problem of correct annotation of bioinorganic proteins has been largely ignored by the bioinformatics community and information on bioinorganic centres obtained by methods other than crystallography or NMR is only available in literature databases., Results: COMe (Co-Ordination of Metals) represents the ontology for bioinorganic and other small molecule centres in complex proteins. COMe consists of three types of entities: 'bioinorganic motif' (BIM), 'molecule' (MOL), and 'complex proteins' (PRX), with each entity being assigned a unique identifier. A BIM consists of at least one centre (metal atom, inorganic cluster, organic molecule) and two or more endogenous and/or exogenous ligands. BIMs are represented as one-dimensional (1-D) strings and 2-D diagrams. A MOL entity represents a 'small molecule' which, when in complex with one or more polypeptides, forms a functional protein. The PRX entities refer to the functional proteins as well as to separate protein domains and subunits. The complex proteins in COMe are subdivided into three categories: (i) metalloproteins, (ii) organic prosthetic group proteins and (iii) modified amino acid proteins. The data are currently stored in both XML format and a relational database and are available at http://www.ebi.ac.uk/come/., Conclusion: COMe provides the classification of proteins according to their 'bioinorganic' features and thus is orthogonal to other classification schemes, such as those based on sequence similarity, 3-D fold, enzyme activity, or biological process. The hierarchical organisation of the controlled vocabulary allows both for annotation and querying at different levels of granularity.
- Published
- 2004
- Full Text
- View/download PDF
24. ArrayExpress: a public database of gene expression data at EBI.
- Author
-
Rocca-Serra P, Brazma A, Parkinson H, Sarkans U, Shojatalab M, Contrino S, Vilo J, Abeygunawardena N, Mukherjee G, Holloway E, Kapushesky M, Kemmeren P, Lara GG, Oezcimen A, and Sansone SA
- Subjects
- Computational Biology, Databases, Genetic, Gene Expression, Oligonucleotide Array Sequence Analysis
- Abstract
ArrayExpress is a public repository for microarray-based gene expression data, resulting from the implementation of the MAGE object model to ensure accurate data structuring and the MIAME standard, which defines the annotation requirements. ArrayExpress accepts data as MAGE-ML files for direct submissions or data from MIAMExpress, the MIAME compliant web-based annotation and submission tool of EBI. A team of curators supports the submission process, providing assistance in data annotation. Data retrieval is performed through a dedicated web interface. Relevant results may be exported to ExpressionProfiler, the EBI based expression analysis tool available online (http://www.ebi.ac.uk/arrayexpress).
- Published
- 2003
- Full Text
- View/download PDF
25. Protein sequence annotation in the genome era: the annotation concept of SWISS-PROT+TREMBL.
- Author
-
Apweiler R, Gateau A, Contrino S, Martin MJ, Junker V, O'Donovan C, Lang F, Mitaritonna N, Kappus S, and Bairoch A
- Subjects
- Amino Acid Sequence, Genome, Software, Software Design, Databases, Factual, Proteins genetics
- Abstract
SWISS-PROT is a curated protein sequence database which strives to provide a high level of annotation, a minimal level of redundancy and high level of integration with other databases. Ongoing genome sequencing projects have dramatically increased the number of protein sequences to be incorporated into SWISS-PROT. Since we do not want to dilute the quality standards of SWISS-PROT by incorporating sequences without proper sequence analysis and annotation, we cannot speed up the incorporation of new incoming data indefinitely. However, as we also want to make the sequences available as fast as possible, we introduced TREMBL (TRanslation of EMBL nucleotide sequence database), a supplement to SWISS-PROT. TREMBL consists of computer-annotated entries in SWISS-PROT format derived from the translation of all coding sequences (CDS) in the EMBL nucleotide sequence database, except for CDS already included in SWISS-PROT. While TREMBL is already of immense value, its computer-generated annotation does not match the quality of SWISS-PROTs. The main difference is in the protein functional information attached to sequences. With this in mind, we are dedicating substantial effort to develop and apply computer methods to enhance the functional information attached to TREMBL entries.
- Published
- 1997
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.