30 results on '"Stuart R. Miyasato"'
Search Results
2. SnoVault and encodeD: A novel object-based storage system and applications to ENCODE metadata.
- Author
-
Benjamin C Hitz, Laurence D Rowe, Nikhil R Podduturi, David I Glick, Ulugbek K Baymuradov, Venkat S Malladi, Esther T Chan, Jean M Davidson, Idan Gabdank, Aditi K Narayana, Kathrina C Onate, Jason Hilton, Marcus C Ho, Brian T Lee, Stuart R Miyasato, Timothy R Dreszer, Cricket A Sloan, J Seth Strattan, Forrest Y Tanaka, Eurie L Hong, and J Michael Cherry
- Subjects
Medicine ,Science - Abstract
The Encyclopedia of DNA elements (ENCODE) project is an ongoing collaborative effort to create a comprehensive catalog of functional elements initiated shortly after the completion of the Human Genome Project. The current database exceeds 6500 experiments across more than 450 cell lines and tissues using a wide array of experimental techniques to study the chromatin structure, regulatory and transcriptional landscape of the H. sapiens and M. musculus genomes. All ENCODE experimental data, metadata, and associated computational analyses are submitted to the ENCODE Data Coordination Center (DCC) for validation, tracking, storage, unified processing, and distribution to community resources and the scientific community. As the volume of data increases, the identification and organization of experimental details becomes increasingly intricate and demands careful curation. The ENCODE DCC has created a general purpose software system, known as SnoVault, that supports metadata and file submission, a database used for metadata storage, web pages for displaying the metadata and a robust API for querying the metadata. The software is fully open-source, code and installation instructions can be found at: http://github.com/ENCODE-DCC/snovault/ (for the generic database) and http://github.com/ENCODE-DCC/encoded/ to store genomic data in the manner of ENCODE. The core database engine, SnoVault (which is completely independent of ENCODE, genomic data, or bioinformatic data) has been released as a separate Python package.
- Published
- 2017
- Full Text
- View/download PDF
3. New data and collaborations at the Saccharomyces Genome Database: updated reference genome, alleles, and the Alliance of Genome Resources
- Author
-
Kalpana Karra, Shuai Weng, Stuart R. Miyasato, Matt Simison, J. Michael Cherry, Marek S. Skrzypek, Suzi Aleksander, Robert S. Nash, Edith D. Wong, Stacia R. Engel, Eric Douglass, and Micheal Alexander
- Subjects
biology ,ved/biology ,Saccharomyces cerevisiae ,ved/biology.organism_classification_rank.species ,Computational biology ,biology.organism_classification ,Genome ,Saccharomyces ,Annotation ,Alliance ,Databases, Genetic ,Genetics ,Humans ,Homology (anthropology) ,Genome, Fungal ,Allele ,Model organism ,Alleles ,Reference genome - Abstract
Saccharomyces cerevisiae is used to provide fundamental understanding of eukaryotic genetics, gene product function, and cellular biological processes. Saccharomyces Genome Database (SGD) has been supporting the yeast research community since 1993, serving as its de facto hub. Over the years, SGD has maintained the genetic nomenclature, chromosome maps, and functional annotation, and developed various tools and methods for analysis and curation of a variety of emerging data types. More recently, SGD and six other model organism focused knowledgebases have come together to create the Alliance of Genome Resources to develop sustainable genome information resources that promote and support the use of various model organisms to understand the genetic and genomic bases of human biology and disease. Here we describe recent activities at SGD, including the latest reference genome annotation update, the development of a curation system for mutant alleles, and new pages addressing homology across model organisms as well as the use of yeast to study human disease.
- Published
- 2021
4. New developments on the Encyclopedia of DNA Elements (ENCODE) data portal
- Author
-
Casey Litton, Zachary Myers, Ulugbek K. Baymuradov, Benjamin C. Hitz, Meenakshi S. Kagda, Otto Jolanki, Jin-Wook Lee, Stuart R. Miyasato, Keenan Graham, Idan Gabdank, Forrest Y. Tanaka, Bonita R. Lam, J. Seth Strattan, Jason A. Hilton, J. Michael Cherry, Yunhai Luo, Philip Adenekan, Paul Sud, Emma O'Neill, Jennifer Jou, and Khine Lin
- Subjects
Interoperability ,Cloud computing ,Data_CODINGANDINFORMATIONTHEORY ,Biology ,ENCODE ,World Wide Web ,Mice ,03 medical and health sciences ,0302 clinical medicine ,Documentation ,Software ,Databases, Genetic ,Genetics ,Database Issue ,Animals ,Humans ,030304 developmental biology ,0303 health sciences ,Genome, Human ,business.industry ,DNA ,Genomics ,Visualization ,Open data ,Encyclopedia ,business ,030217 neurology & neurosurgery - Abstract
The Encyclopedia of DNA Elements (ENCODE) is an ongoing collaborative research project aimed at identifying all the functional elements in the human and mouse genomes. Data generated by the ENCODE consortium are freely accessible at the ENCODE portal (https://www.encodeproject.org/), which is developed and maintained by the ENCODE Data Coordinating Center (DCC). Since the initial portal release in 2013, the ENCODE DCC has updated the portal to make ENCODE data more findable, accessible, interoperable and reusable. Here, we report on recent updates, including new ENCODE data and assays, ENCODE uniform data processing pipelines, new visualization tools, a dataset cart feature, unrestricted public access to ENCODE data on the cloud (Amazon Web Services open data registry, https://registry.opendata.aws/encode-project/) and more comprehensive tutorials and documentation.
- Published
- 2019
5. Transcriptome visualization and data availability at the Saccharomyces Genome Database
- Author
-
Stuart R. Miyasato, Sagar Jha, Suzi Aleksander, Kalpana Karra, Barbara Dunn, Patrick Ng, Stacia R. Engel, Marek S. Skrzypek, Matt Simison, Robert S. Nash, Joanna Argasinska, Felix Gondwe, Edith D. Wong, Kevin A. MacPherson, J. Michael Cherry, and Shuai Weng
- Subjects
Saccharomyces cerevisiae Proteins ,RNA-Seq ,Genomics ,Genome browser ,Computational biology ,Saccharomyces cerevisiae ,Biology ,Web Browser ,Genome ,Genome engineering ,03 medical and health sciences ,Open Reading Frames ,User-Computer Interface ,Reference Values ,Databases, Genetic ,Genetics ,Database Issue ,Protein Isoforms ,Gene ,030304 developmental biology ,0303 health sciences ,030302 biochemistry & molecular biology ,Computational Biology ,Molecular Sequence Annotation ,Genome, Fungal ,Transcriptome ,Reference genome - Abstract
The Saccharomyces Genome Database (SGD; www.yeastgenome.org) maintains the official annotation of all genes in the Saccharomyces cerevisiae reference genome and aims to elucidate the function of these genes and their products by integrating manually curated experimental data. Technological advances have allowed researchers to profile RNA expression and identify transcripts at high resolution. These data can be configured in web-based genome browser applications for display to the general public. Accordingly, SGD has incorporated published transcript isoform data in our instance of JBrowse, a genome visualization platform. This resource will help clarify S. cerevisiae biological processes by furthering studies of transcriptional regulation, untranslated regions, genome engineering, and expression quantification in S. cerevisiae.
- Published
- 2019
6. Alliance of Genome Resources Portal: unified model organism research platform
- Author
-
Adam Wright, Paul W. Sternberg, Daniela Raciti, Monika Tutaj, Josh Goodman, Ken Frazer, Paul Thomas, Scott Cain, Raymond Lee, Judith A. Blake, Patrick Kalita, Ajay Shrivatsav, Julie Agapite, Marek S. Skrzypek, Hans-Michael Mueller, Wen J. Chen, Karen Yook, Gillian Millburn, Joanna Argasinska, David Fashena, Kevin Schaper, Joel E. Richardson, Douglas G. Howe, Barbara Dunn, Yvonne M. Bradford, Nathan Dunn, Jaehyoung Cho, Ranjana Kishore, Kalpana Karra, Sabrina Toro, Anne E. Eagle, Norbert Perrimon, Anushya Muruganujan, Beverley B. Matthews, Christian A. Grove, Edith D. Wong, Monte Westerfield, Olin Blodgett, Gary Williams, Jose-Maria Urbano, Marie-Claire Harrison, Steven J Marygold, Tremayne Mushayahama, Marek Tutaj, Susan Russo Gelbart, Jennifer R. Smith, Felix Gondwe, Dustin Ebert, Juancarlos Chan, J. Michael Cherry, Ceri E. Van Slyke, Christopher J. Tabone, L. Sian Gramates, Madeline A. Crosby, Robert S. Nash, Kevin A. MacPherson, Patrick Ng, Christian Pich, Suzi Aleksander, Monika Tomczuk, Brian R. Calvi, Todd W. Harris, Cynthia L. Smith, Stan Laulederkind, Jyothi Thota, Gilberto dos Santos, Matt Simison, Kimberly Van Auken, Mary E. Dolan, Karen R. Christie, Stacia R. Engel, Leyla Ruzicka, Carol J. Bult, Kevin L. Howe, Stuart R. Miyasato, Shur-Jen Wang, David R. Shaw, Mary Shimoyama, Valerio Arnaboldi, Matthew Russell, Michael Paulini, Sibyl Gao, Sagar Jha, Jeff De Pons, Christopher J. Mungall, Seth Carbon, James A. Kadin, Sierra A. T. Moxon, Susan M. Bello, Thomas C. Kaufman, Laurent-Philippe Albou, Shuai Weng, and Helen Attrill
- Subjects
NAR Breakthrough Article ,Saccharomyces cerevisiae ,Biology ,Genome ,Data modeling ,Mice ,User-Computer Interface ,03 medical and health sciences ,0302 clinical medicine ,Resource (project management) ,Databases, Genetic ,Genetics ,Animals ,Humans ,Caenorhabditis elegans ,Alleles ,Zebrafish ,Organism ,030304 developmental biology ,Internet ,0303 health sciences ,Genome, Human ,Computational Biology ,Genomics ,Data science ,Rats ,Variety (cybernetics) ,Drosophila melanogaster ,Gene Ontology ,Data access ,Alliance ,Workflow ,Software ,030217 neurology & neurosurgery - Abstract
The Alliance of Genome Resources (Alliance) is a consortium of the major model organism databases and the Gene Ontology that is guided by the vision of facilitating exploration of related genes in human and well-studied model organisms by providing a highly integrated and comprehensive platform that enables researchers to leverage the extensive body of genetic and genomic studies in these organisms. Initiated in 2016, the Alliance is building a central portal (www.alliancegenome.org) for access to data for the primary model organisms along with gene ontology data and human data. All data types represented in the Alliance portal (e.g. genomic data and phenotype descriptions) have common data models and workflows for curation. All data are open and freely available via a variety of mechanisms. Long-term plans for the Alliance project include a focus on coverage of additional model organisms including those without dedicated curation communities, and the inclusion of new data types with a particular focus on providing data and tools for the non-model-organism researcher that support enhanced discovery about human health and disease. Here we review current progress and present immediate plans for this new bioinformatics resource.
- Published
- 2019
7. Saccharomyces genome database informs human biology
- Author
-
Sage T. Hellerstedt, J. Michael Cherry, Kevin A. MacPherson, Shuai Weng, Marek S. Skrzypek, Kalpana Karra, Stacia R. Engel, Robert S. Nash, Gail Binkley, Travis K. Sheppard, Stuart R. Miyasato, Matt Simison, and Edith D. Wong
- Subjects
0301 basic medicine ,ved/biology.organism_classification_rank.species ,Saccharomyces cerevisiae ,Genes, Fungal ,Computational biology ,Biology ,ENCODE ,Saccharomyces ,03 medical and health sciences ,Information resource ,Species Specificity ,Human biology ,Databases, Genetic ,Genetics ,Database Issue ,Humans ,Model organism ,Saccharomyces genome database ,ved/biology ,Genome, Human ,biology.organism_classification ,Budding yeast ,030104 developmental biology ,Gene Ontology ,Mutation ,Genome, Fungal ,Forecasting - Abstract
The Saccharomyces Genome Database (SGD; http://www.yeastgenome.org) is an expertly curated database of literature-derived functional information for the model organism budding yeast, Saccharomyces cerevisiae. SGD constantly strives to synergize new types of experimental data and bioinformatics predictions with existing data, and to organize them into a comprehensive and up-to-date information resource. The primary mission of SGD is to facilitate research into the biology of yeast and to provide this wealth of information to advance, in many ways, research on other organisms, even those as evolutionarily distant as humans. To build such a bridge between biological kingdoms, SGD is curating data regarding yeast-human complementation, in which a human gene can successfully replace the function of a yeast gene, and/or vice versa. These data are manually curated from published literature, made available for download, and incorporated into a variety of analysis tools provided by SGD.
- Published
- 2017
8. The Encyclopedia of DNA elements (ENCODE): data portal update
- Author
-
Aditi K. Narayanan, Benjamin C. Hitz, Timothy R. Dreszer, Kriti Jain, Otto Jolanki, Idan Gabdank, Keenan Graham, Kathrina C. Onate, Jason A. Hilton, Stuart R. Miyasato, J. Michael Cherry, Cricket A. Sloan, J. Seth Strattan, Carrie A. Davis, Esther T. Chan, Jean M. Davidson, Forrest Y. Tanaka, and Ulugbek K. Baymuradov
- Subjects
0301 basic medicine ,Download ,Interface (Java) ,Datasets as Topic ,Genomics ,Biology ,Bioinformatics ,ENCODE ,World Wide Web ,03 medical and health sciences ,Mice ,User-Computer Interface ,Databases, Genetic ,Genetics ,Database Issue ,Animals ,Humans ,Caenorhabditis elegans ,Metadata ,Genome, Human ,High-Throughput Nucleotide Sequencing ,DNA ,Visualization ,030104 developmental biology ,Drosophila melanogaster ,Gene Components ,Encyclopedia ,Data Display ,Forecasting - Abstract
The Encyclopedia of DNA Elements (ENCODE) Data Coordinating Center has developed the ENCODE Portal database and website as the source for the data and metadata generated by the ENCODE Consortium. Two principles have motivated the design. First, experimental protocols, analytical procedures and the data themselves should be made publicly accessible through a coherent, web-based search and download interface. Second, the same interface should serve carefully curated metadata that record the provenance of the data and justify its interpretation in biological terms. Since its initial release in 2013 and in response to recommendations from consortium members and the wider community of scientists who use the Portal to access ENCODE data, the Portal has been regularly updated to better reflect these design principles. Here we report on these updates, including results from new experiments, uniformly-processed data from other projects, new visualization tools and more comprehensive metadata to describe experiments and analyses. Additionally, the Portal is now home to meta(data) from related projects including Genomics of Gene Regulation, Roadmap Epigenome Project, Model organism ENCODE (modENCODE) and modERN. The Portal now makes available over 13000 datasets and their accompanying metadata and can be accessed at: https://www.encodeproject.org/.
- Published
- 2017
9. The ENCODE Portal as an Epigenomics Resource
- Author
-
J. Seth Strattan, Khine Lin, Keenan Graham, Casey Litton, Emma O'Neill, Philip Adenekan, Jason A. Hilton, Paul Sud, Benjamin C. Hitz, Idan Gabdank, J. Michael Cherry, Yunhai Luo, Forrest Y. Tanaka, Zachary Myers, Jennifer Jou, Stuart R. Miyasato, Ulugbek K. Baymuradov, Otto Jolanki, Meenakshi S. Kagda, Jin-Wook Lee, and Bonita R. Lam
- Subjects
Epigenomics ,Computer science ,Genomics ,ENCODE ,Article ,03 medical and health sciences ,Mice ,Data file ,Databases, Genetic ,Animals ,Humans ,Protocol (object-oriented programming) ,030304 developmental biology ,0303 health sciences ,Internet ,Metadata ,Information retrieval ,Genome, Human ,030305 genetics & heredity ,General Medicine ,DNA ,DNA Methylation ,Metadata modeling ,Chromatin ,ComputingMethodologies_PATTERNRECOGNITION ,Human genome ,Software - Abstract
The Encyclopedia of DNA Elements (ENCODE) web portal hosts genomic data generated by the ENCODE Consortium, Genomics of Gene Regulation, The NIH Roadmap Epigenomics Consortium, and the modENCODE and modERN projects. The goal of the ENCODE project is to build a comprehensive map of the functional elements of the human and mouse genomes. Currently, the portal database stores over 500 TB of raw and processed data from over 15,000 experiments spanning assays that measure gene expression, DNA accessibility, DNA and RNA binding, DNA methylation, and 3D chromatin structure across numerous cell lines, tissue types, and differentiation states with selected genetic and molecular perturbations. The ENCODE portal provides unrestricted access to the aforementioned data and relevant metadata as a service to the scientific community. The metadata model captures the details of the experiments, raw and processed data files, and processing pipelines in human and machine-readable form and enables the user to search for specific data either using a web browser or programmatically via REST API. Furthermore, ENCODE data can be freely visualized or downloaded for additional analyses. © 2019 The Authors. Basic Protocol: Query the portal Support Protocol 1: Batch downloading Support Protocol 2: Using the cart to download files Support Protocol 3: Visualize data Alternate Protocol: Query building and programmatic access.
- Published
- 2019
10. The Gene Ontology Resource: 20 years and still GOing strong
- Author
-
Rebecca Tauber, Robert J. Dodson, Marek S. Skrzypek, Raymond Lee, Valerie Wood, Paul W. Sternberg, C. Rivoire, Nancy H. Campbell, E. Hatton-Ellis, M. Rodriguez-Lopez, Elena Speretta, D. S. Osumi, Alix J. Rey, A. Mac-Dougall, Jane E. Mendel, Christopher J. Mungall, Helen Parkinson, Maria Jesus Martin, Pascale Gaudet, A. Stutz, Nathan Dunn, Gillian Millburn, Kate Warner, K. Axelsen, C. Arighi, Mary E. Dolan, M. J. Kesling, Barbara Kramarz, Seth Carbon, Joshua L. Goodman, Rachael P. Huntley, Anjali Shrivastava, Daniela Raciti, C. Wu, Victor B. Strelets, Steven J Marygold, H. Drabkin, M. Magrane, Benjamin M. Good, A. Shrivatsav Vp, Lorna Richardson, James P. Balhoff, P. Lemercier, E. Bakker, Amaia Sangrador-Vegas, Marc Feuermann, Paul Thomas, D. Lieberherr, J. Cho, Hans-Michael Müller, Robert S. Nash, Leonore Reiser, Birgit H M Meldal, Neil D. Rawlings, N. N. Hyka, D. A. Natale, Paola Roncaglia, Paul Denny, Michelle G. Giglio, Judith A. Blake, S. Sundaram, Shankar Subramaniam, Marcus C. Chibucos, Kevin A. MacPherson, S. Poux, Karen R. Christie, Mary Shimoyama, Eva Huala, Colin Logie, Huaiyu Mi, Felix Gondwe, K. Pichler, Petra Fey, Deborah A. Siegele, Phani V. Garapati, N. Tyagi, J L De Pons, Alex Bateman, Melinda R. Dwinell, Pablo Porras, Giulia Antonazzo, Midori A. Harris, Y. Lussi, Stuart R. Miyasato, Li Ni, K. Laiho, A. Estreicher, Travis K. Sheppard, Edith D. Wong, M. C. Harrison, H. Chen, S. Basu, Sandra A. LaBonte, Margaret Duesbury, E. Hartline, Sibyl Gao, Vítor Trovisco, Jacqueline Hayles, George Georghiou, Rex L. Chisholm, Kathleen Falls, S. Poudel, James C. Hu, G. T. Hayman, Kim Rutherford, F. Jungo, Hsin-Yu Chang, E. Boutet, Robert D. Finn, Alex L. Mitchell, Stan Laulederkind, J. H. Rawson, Marek Tutaj, Vanessa Acquaah, Peter D'Eustachio, G. Keller, L. Breuza, P. Garmiri, Nicholas H. Brown, Laurent-Philippe Albou, Antonia Lock, Nomi L. Harris, U. Hinz, Matthew Berriman, R. Britto, Rossana Zaru, Suzanna E. Lewis, N. Gruaz-Gumowski, Livia Perfetto, Matt Simison, Martin Kuiper, Shuai Weng, M. Tognolli, G. Dos Santos, Elizabeth R Bolton, Xiaosong Huang, A. Gos, P. Masson, David B. Emmert, Lisa Matthews, C. Casals-Casas, Kevin L. Howe, N. T. Del, Sandra Orchard, L. Famiglietti, Doug Howe, T. Sawford, T. E.M. Jones, Stephen G. Oliver, Kalpana Karra, S. Fexova, Tremayne Mushayahama, Dustin Ebert, Jim Thurmond, Ruth C. Lovering, E. Coudert, A. Bridge, Suzi Aleksander, Suvarna Nadendla, Christian A. Grove, David P. Hill, J. M. Cherry, M. C. Blatter, K. Van Auken, H. Bye-A-Jee, B. L. Dunn, A. Lreid, Sabrina Toro, Monte Westerfield, Z. Xie, A. Auchincloss, I. Pedruzzi, Anushya Muruganujan, B. Bely, S. H. Ahmad, Stacia R. Engel, Shur-Jen Wang, Gail Binkley, Lincoln Stein, Pinglei Zhou, G. P. Argoud, Marcio Luis Acencio, C. Hulo, Jürg Bähler, Juancarlos Chan, P. C. Ng, Helen Attrill, Mélanie Courtot, A. Ignatchenko, Tanya Z. Berardini, D. Sitnikov, Eric Douglass, and A. Shypitsyna
- Subjects
Quality Control ,media_common.quotation_subject ,Ontology (information science) ,Biology ,History, 21st Century ,Filter (software) ,Unique identifier ,World Wide Web ,03 medical and health sciences ,0302 clinical medicine ,Resource (project management) ,Web page ,Genetics ,Animals ,Humans ,Database Issue ,Quality (business) ,Function (engineering) ,Molecular Biology ,030304 developmental biology ,media_common ,0303 health sciences ,Focus (computing) ,Bacteria ,Eukaryota ,Molecular Sequence Annotation ,History, 20th Century ,High-Throughput Screening Assays ,Gene Ontology ,Mitogen-Activated Protein Kinases ,030217 neurology & neurosurgery - Abstract
The Gene Ontology resource (GO; http://geneontology.org) provides structured, computable knowledge regarding the functions of genes and gene products. Founded in 1998, GO has become widely adopted in the life sciences, and its contents are under continual improvement, both in quantity and in quality. Here, we report the major developments of the GO resource during the past two years. Each monthly release of the GO resource is now packaged and given a unique identifier (DOI), enabling GO-based analyses on a specific release to be reproduced in the future. The molecular function ontology has been refactored to better represent the overall activities of gene products, with a focus on transcription regulator activities. Quality assurance efforts have been ramped up to address potentially out-of-date or inaccurate annotations. New evidence codes for high-throughput experiments now enable users to filter out annotations obtained from these sources. GO-CAM, a new framework for representing gene function that is more expressive than standard GO annotations, has been released, and users can now explore the growing repository of these models. We also provide the ‘GO ribbon’ widget for visualizing GO annotations to a gene; the widget can be easily embedded in any web page. This is an open access article distributed under the terms of the Creative Commons CC BY license, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
- Published
- 2018
11. Prevention of data duplication for high throughput sequencing repositories
- Author
-
J. Seth Strattan, Carrie A. Davis, Forrest Y. Tanaka, Benjamin C. Hitz, J. Michael Cherry, Keenan Graham, Jean M. Davidson, Jason A. Hilton, Idan Gabdank, Kathrina C. Onate, Stuart R. Miyasato, Otto Jolanki, Timothy R. Dreszer, Esther T. Chan, Aditi K. Narayanan, Ulugbek K. Baymuradov, and Cricket A. Sloan
- Subjects
0301 basic medicine ,Computer science ,business.industry ,Extramural ,MEDLINE ,Computational biology ,General Biochemistry, Genetics and Molecular Biology ,DNA sequencing ,03 medical and health sciences ,030104 developmental biology ,0302 clinical medicine ,Text mining ,Data deduplication ,Original Article ,Databases, Nucleic Acid ,General Agricultural and Biological Sciences ,business ,Data Curation ,030217 neurology & neurosurgery ,Information Systems - Abstract
Prevention of unintended duplication is one of the ongoing challenges many databases have to address. Working with high-throughput sequencing data, the complexity of that challenge increases with the complexity of the definition of a duplicate. In a computational data model, a data object represents a real entity like a reagent or a biosample. This representation is similar to how a card represents a book in a paper library catalog. Duplicated data objects not only waste storage, they can mislead users into assuming the model represents more than the single entity. Even if it is clear that two objects represent a single entity, data duplication opens the door to potential inconsistencies between the objects since the content of the duplicated objects can be updated independently, allowing divergence of the metadata associated with the objects. Analogously to a situation in which a catalog in a paper library would contain by mistake two cards for a single copy of a book. If these cards are listing simultaneously two different individuals as current book borrowers, it would be difficult to determine which borrower (out of the two listed) actually has the book. Unfortunately, in a large database with multiple submitters, unintended duplication is to be expected. In this article, we present three principal guidelines the Encyclopedia of DNA Elements (ENCODE) Portal follows in order to prevent unintended duplication of both actual files and data objects: definition of identifiable data objects (I), object uniqueness validation (II) and de-duplication mechanism (III). In addition to explaining our modus operandi, we elaborate on the methods used for identification of sequencing data files. Comparison of the approach taken by the ENCODE Portal vs other widely used biological data repositories is provided. Database URL: https://www.encodeproject.org/
- Published
- 2018
12. The Reference Genome Sequence ofSaccharomyces cerevisiae: Then and Now
- Author
-
Edith D. Wong, Maria C. Costanzo, Dianna G. Fisk, Marek S. Skrzypek, Selina S. Dwight, Fred S. Dietrich, Paul Lloyd, Robert S. Nash, Kalpana Karra, Stacia R. Engel, Gail Binkley, Matt Simison, J. Michael Cherry, Benjamin C. Hitz, Stuart R. Miyasato, Rama Balakrishnan, and Shuai Weng
- Subjects
Databases, Factual ,Sequence analysis ,Saccharomyces cerevisiae ,Investigations ,ENCODE ,genome release ,Genome ,Open Reading Frames ,User-Computer Interface ,03 medical and health sciences ,0302 clinical medicine ,Genetics ,model organism ,Molecular Biology ,Genetics (clinical) ,030304 developmental biology ,Whole genome sequencing ,Internet ,0303 health sciences ,biology ,reference sequence ,Chromosome Mapping ,Sequence Analysis, DNA ,Genome project ,S288C ,biology.organism_classification ,Yeast ,Genome, Fungal ,030217 neurology & neurosurgery ,Reference genome - Abstract
The genome of the budding yeast Saccharomyces cerevisiae was the first completely sequenced from a eukaryote. It was released in 1996 as the work of a worldwide effort of hundreds of researchers. In the time since, the yeast genome has been intensively studied by geneticists, molecular biologists, and computational scientists all over the world. Maintenance and annotation of the genome sequence have long been provided by the Saccharomyces Genome Database, one of the original model organism databases. To deepen our understanding of the eukaryotic genome, the S. cerevisiae strain S288C reference genome sequence was updated recently in its first major update since 1996. The new version, called “S288C 2010,” was determined from a single yeast colony using modern sequencing technologies and serves as the anchor for further innovations in yeast genomic science.
- Published
- 2014
13. The Aspergillus Genome Database: multispecies curation and incorporation of RNA-Seq data to improve structural gene annotations
- Author
-
Gavin Sherlock, Farrell Wymore, Martha B. Arnaud, Stuart R. Miyasato, Marek S. Skrzypek, Jonathan Binkley, Joshua Orvis, Matt Simison, Prachi Shah, Diane O. Inglis, Gustavo C. Cerqueira, Jennifer R. Wortman, and Gail Binkley
- Subjects
Genes, Fungal ,Genome ,Aspergillus fumigatus ,03 medical and health sciences ,Aspergillus oryzae ,Aspergillus nidulans ,Databases, Genetic ,Genetics ,skin and connective tissue diseases ,030304 developmental biology ,Comparative genomics ,0303 health sciences ,Aspergillus ,Internet ,biology ,030306 microbiology ,Sequence Analysis, RNA ,Gene Expression Profiling ,Aspergillus niger ,Molecular Sequence Annotation ,biology.organism_classification ,Genome, Fungal ,IV. Viruses, bacteria, protozoa and fungi - Abstract
The Aspergillus Genome Database (AspGD; http://www.aspgd.org) is a freely available web-based resource that was designed for Aspergillus researchers and is also a valuable source of information for the entire fungal research community. In addition to being a repository and central point of access to genome, transcriptome and polymorphism data, AspGD hosts a comprehensive comparative genomics toolbox that facilitates the exploration of precomputed orthologs among the 20 currently available Aspergillus genomes. AspGD curators perform gene product annotation based on review of the literature for four key Aspergillus species: Aspergillus nidulans, Aspergillus oryzae, Aspergillus fumigatus and Aspergillus niger. We have iteratively improved the structural annotation of Aspergillus genomes through the analysis of publicly available transcription data, mostly expressed sequenced tags, as described in a previous NAR Database article (Arnaud et al. 2012). In this update, we report substantive structural annotation improvements for A. nidulans, A. oryzae and A. fumigatus genomes based on recently available RNA-Seq data. Over 26 000 loci were updated across these species; although those primarily comprise the addition and extension of untranslated regions (UTRs), the new analysis also enabled over 1000 modifications affecting the coding sequence of genes in each target genome.
- Published
- 2013
14. TheCandidaGenome Database: The new homology information page highlights protein similarity and phylogeny
- Author
-
Marek S. Skrzypek, Gail Binkley, Farrell Wymore, Martha B. Arnaud, Stuart R. Miyasato, Jonathan Binkley, Matt Simison, Diane O. Inglis, Prachi Shah, and Gavin Sherlock
- Subjects
Genetics ,Internet ,0303 health sciences ,Fungal protein ,Sequence Homology, Amino Acid ,030306 microbiology ,Genome database ,Locus (genetics) ,Computational biology ,Biology ,Genome ,Homology (biology) ,Fungal Proteins ,03 medical and health sciences ,ComputingMethodologies_PATTERNRECOGNITION ,Protein similarity ,Phylogenetics ,Databases, Genetic ,Genome, Fungal ,Gene ,Phylogeny ,IV. Viruses, bacteria, protozoa and fungi ,Candida ,030304 developmental biology - Abstract
The Candida Genome Database (CGD, http://www.candidagenome.org/) is a freely available online resource that provides gene, protein and sequence information for multiple Candida species, along with web-based tools for accessing, analyzing and exploring these data. The goal of CGD is to facilitate and accelerate research into Candida pathogenesis and biology. The CGD Web site is organized around Locus pages, which display information collected about individual genes. Locus pages have multiple tabs for accessing different types of information; the default Summary tab provides an overview of the gene name, aliases, phenotype and Gene Ontology curation, whereas other tabs display more in-depth information, including protein product details for coding genes, notes on changes to the sequence or structure of the gene and a comprehensive reference list. Here, in this update to previous NAR Database articles featuring CGD, we describe a new tab that we have added to the Locus page, entitled the Homology Information tab, which displays phylogeny and gene similarity information for each locus.
- Published
- 2013
15. The Candida Genome Database (CGD): incorporation of Assembly 22, systematic identifiers and visualization of high throughput sequencing data
- Author
-
Stuart R. Miyasato, Marek S. Skrzypek, Jonathan Binkley, Matt Simison, Gail Binkley, and Gavin Sherlock
- Subjects
0301 basic medicine ,030106 microbiology ,Genomics ,Computational biology ,Biology ,Web Browser ,Genome ,DNA sequencing ,Fungal Proteins ,03 medical and health sciences ,Open Reading Frames ,Genetics ,Database Issue ,Gene ,Sequence (medicine) ,Candida ,Whole genome sequencing ,Fungal protein ,Computational Biology ,High-Throughput Nucleotide Sequencing ,Molecular Sequence Annotation ,030104 developmental biology ,Genome, Fungal ,Databases, Nucleic Acid ,Software - Abstract
The Candida Genome Database (CGD, http://www.candidagenome.org/) is a freely available online resource that provides gene, protein and sequence information for multiple Candida species, along with web-based tools for accessing, analyzing and exploring these data. The mission of CGD is to facilitate and accelerate research into Candida pathogenesis and biology, by curating the scientific literature in real time, and connecting literature-derived annotations to the latest version of the genomic sequence and its annotations. Here, we report the incorporation into CGD of Assembly 22, the first chromosome-level, phased diploid assembly of the C. albicans genome, coupled with improvements that we have made to the assembly using additional available sequence data. We also report the creation of systematic identifiers for C. albicans genes and sequence features using a system similar to that adopted by the yeast community over two decades ago. Finally, we describe the incorporation of JBrowse into CGD, which allows online browsing of mapped high throughput sequencing data, and its implementation for several RNA-Seq data sets, as well as the whole genome sequencing data that was used in the construction of Assembly 22.
- Published
- 2016
16. Saccharomyces Genome Database: the genomics resource of budding yeast
- Author
-
Marek S. Skrzypek, Eurie L. Hong, Edith D. Wong, Cynthia J. Krieger, Selina S. Dwight, Stuart R. Miyasato, Maria C. Costanzo, Robert S. Nash, Jodi E. Hirschman, Esther T. Chan, Kalpana Karra, Benjamin C. Hitz, Julie Park, Dianna G. Fisk, J. Michael Cherry, Karen R. Christie, Shuai Weng, Matt Simison, Rama Balakrishnan, Stacia R. Engel, Gail Binkley, and Craig Amundsen
- Subjects
Genes, Fungal ,Saccharomyces cerevisiae ,Genomics ,Genome browser ,Computational biology ,Saccharomyces ,Genome ,03 medical and health sciences ,0302 clinical medicine ,Terminology as Topic ,Databases, Genetic ,Web page ,Genetics ,030304 developmental biology ,0303 health sciences ,biology ,High-Throughput Nucleotide Sequencing ,Molecular Sequence Annotation ,Articles ,biology.organism_classification ,Phenotype ,ComputingMethodologies_PATTERNRECOGNITION ,Encyclopedia ,Genome, Fungal ,Software ,030217 neurology & neurosurgery - Abstract
The Saccharomyces Genome Database (SGD, http://www.yeastgenome.org) is the community resource for the budding yeast Saccharomyces cerevisiae. The SGD project provides the highest-quality manually curated information from peer-reviewed literature. The experimental results reported in the literature are extracted and integrated within a well-developed database. These data are combined with quality high-throughput results and provided through Locus Summary pages, a powerful query engine and rich genome browser. The acquisition, integration and retrieval of these data allow SGD to facilitate experimental design and analysis by providing an encyclopedia of the yeast genome, its chromosomal features, their functions and interactions. Public access to these data is provided to researchers and educators via web pages designed for optimal ease of use.
- Published
- 2011
17. The Gene Ontology: enhancements for 2011
- Author
-
P D'Eustachio, Benjamin C. Hitz, Julie Park, Paul Browne, Douglas G. Howe, Cynthia J. Krieger, Kalpana Karra, Stan Laulederkind, Karen R. Christie, Susan Tweedie, Eurie L. Hong, Lydie Bougueleret, Michele Magrane, Cathy R. Gresham, Rolf Apweiler, Lisa Matthews, Dong Li, Philippa J. Talmud, Ioannis Xenarios, J. M. Cherry, Tanya Z. Berardini, Deborah A. Siegele, Rama Balakrishnan, D. Sitnikov, A. Auchinchloss, Selina S. Dwight, Tony Sawford, Paul J. Kersey, Ruth C. Lovering, Ruth Y. Eberhardt, Ursula Hinz, Lakshmi Pillai, Sylvain Poux, Edith D. Wong, Klemens Pichler, Kati Laiho, Malcolm J. Gardner, Stephen G. Oliver, Lionel Breuza, Kara Dolinski, P Lemercier, Kristian B. Axelsen, Midori A. Harris, Adrienne E. Zweifel, H. Drabkin, Guillaume Keller, Marek S. Skrzypek, Daniel M. Staines, Fiona M. McCarthy, Nicholas H. Brown, Mark D. McDowall, Antonia Lock, Mary Shimoyama, Maria C. Costanzo, Teresia Buza, S. Jimenez, Rex L. Chisholm, Paul W. Sternberg, Hui Wang, Nadine Gruaz-Gumowski, Chantal Hulo, Rebecca E. Foulger, Melinda R. Dwinell, Judith A. Blake, Marcus C. Chibucos, B. K. McIntosh, C. D. Amundsen, Jane Lomax, L Famiglietti, Tom Hayman, Michael Tognolli, Eva Huala, James C. Hu, Patrick Masson, Maria Jesus Martin, Benoit Bely, Shuai Weng, Heather C. Wick, E. Dimmer, L. Ni, Catherine Rivoire, Christopher J. Mungall, H. Sehra, P. Duek-Roggli, Maria Victoria Schneider, Dianna G. Fisk, Michael S. Livstone, Ivo Pedruzzi, Shyamala Sundaram, Donna K. Slonim, Isabelle Cusin, Stuart R. Miyasato, Timothy F. Lowry, Varsha K. Khodiyar, Seth Carbon, Elisabeth Coudert, Jürg Bähler, Juancarlos Chan, Evelyn Camon, Daniel P. Renfro, Anne Estreicher, M. C. Blatter, Robert S. Nash, P Gaudet, Sven Heinicke, K. Van Auken, Stacia R. Engel, Alan Bridge, Ralf Stephan, Mary E. Dolan, Shane C. Burgess, Petra Fey, Shur-Jen Wang, Damien Lieberherr, Duncan Legge, P. Porras Millán, Andre Stutz, Yasmin Alam-Faruque, Gail Binkley, Bernd Roechert, S. Branconi-Quintaje, Ghislaine Argoud-Puy, S. Basu, Kim Rutherford, M. Moinat, Monte Westerfield, Arnaud Gos, Eleanor J Stanley, Valerie Wood, Ranjana Kishore, Diego Poggioli, S. Ferro-Rojas, Victoria Petri, Florence Jungo, Suzanna E. Lewis, Emmanuel Boutet, Warren A. Kibbe, M Feuermann, Claire O'Donovan, W. M. Chan, J. James, David P. Hill, Rachael P. Huntley, M. Gwinn Giglio, Paul Thomas, Jodi E. Hirschman, Paola Roncaglia, Gene Ontology Consortium, Blake, JA., Dolan, M., Drabkin, H., Hill, DP., Ni, L., Sitnikov, D., Burgess, S., Buza, T., Gresham, C., McCarthy, F., Pillai, L., Wang, H., Carbon, S., Lewis, SE., Mungall, CJ., Gaudet, P., Chisholm, RL., Fey, P., Kibbe, WA., Basu, S., Siegele, DA., McIntosh, BK., Renfro, DP., Zweifel, AE., Hu, JC., Brown, NH., Tweedie, S., Alam-Faruque, Y., Apweiler, R., Auchinchloss, A., Axelsen, K., Argoud-Puy, G., Bely, B., Blatter, M-., Bougueleret, L., Boutet, E., Branconi, S., Breuza, L., Bridge, A., Browne, P., Chan, WM., Coudert, E., Cusin, I., Dimmer, E., Duek-Roggli, P., Eberhardt, R., Estreicher, A., Famiglietti, L., Ferro-Rojas, S., Feuermann, M., Gardner, M., Gos, A., Gruaz-Gumowski, N., Hinz, U., Hulo, C., Huntley, R., James, J., Jimenez, S., Jungo, F., Keller, G., Laiho, K., Legge, D., Lemercier, P., Lieberherr, D., Magrane, M., Martin, MJ., Masson, P., Moinat, M., O'Donovan, C., Pedruzzi, I., Pichler, K., Poggioli, D., Porras Millán, P., Poux, S., Rivoire, C., Roechert, B., Sawford, T., Schneider, M., Sehra, H., Stanley, E., Stutz, A., Sundaram, S., Tognolli, M., Xenarios, I., Foulger, R., Lomax, J., Roncaglia, P., Camon, E., Khodiyar, VK., Lovering, RC., Talmud, PJ., Chibucos, M., Gwinn Giglio, M., Dolinski, K., Heinicke, S., Livstone, MS., Stephan, R., Harris, MA., Oliver, SG., Rutherford, K., Wood, V., Bahler, J., Lock, A., Kersey, PJ., McDowall, MD., Staines, DM., Dwinell, M., Shimoyama, M., Laulederkind, S., Hayman, T., Wang, S-., Petri, V., Lowry, T., D'Eustachio, P., Matthews, L., Amundsen, CD., Balakrishnan, R., Binkley, G., Cherry, JM., Christie, KR., Costanzo, MC., Dwight, SS., Engel, SR., Fisk, DG., Hirschman, JE., Hitz, BC., Hong, EL., Karra, K., Krieger, CJ., Miyasato, SR., Nash, RS., Park, J., Skrzypek, MS., Weng, S., Wong, ED., Berardini, TZ., Li, D., Huala, E., Slonim, D., Wick, H., Thomas, P., Chan, J., Kishore, R., Sternberg, P., Van Auken, K., Howe, D., and Westerfield, M.
- Subjects
Quality Control ,0303 health sciences ,media_common.quotation_subject ,Databases, Genetic ,Molecular Sequence Annotation/standards ,Vocabulary, Controlled ,Inference ,Molecular Sequence Annotation ,Articles ,Biology ,Ontology (information science) ,World Wide Web ,Open Biomedical Ontologies ,03 medical and health sciences ,Annotation ,0302 clinical medicine ,Resource (project management) ,Controlled vocabulary ,Genetics ,Social media ,Function (engineering) ,030217 neurology & neurosurgery ,030304 developmental biology ,media_common - Abstract
The Gene Ontology (GO) (http://www.geneontology.org) is a community bioinformatics resource that represents gene product function through the use of structured, controlled vocabularies. The number of GO annotations of gene products has increased due to curation efforts among GO Consortium (GOC) groups, including focused literature-based annotation and ortholog-based functional inference. The GO ontologies continue to expand and improve as a result of targeted ontology development, including the introduction of computable logical definitions and development of new tools for the streamlined addition of terms to the ontology. The GOC continues to support its user community through the use of e-mail lists, social media and web-based resources.
- Published
- 2011
18. The Aspergillus Genome Database (AspGD): recent developments in comprehensive multispecies curation, comparative genomics and community resources
- Author
-
Martha B. Arnaud, Joshua Orvis, Stuart R. Miyasato, Jonathan Binkley, Farrell Wymore, Marek S. Skrzypek, Prachi Shah, Gavin Sherlock, Marcus C. Chibucos, Clinton Howarth, Matt Simison, Gail Binkley, Jonathan Crabtree, Diane O. Inglis, Gustavo C. Cerqueira, and Jennifer R. Wortman
- Subjects
Genes, Fungal ,Genomics ,Genome ,Aspergillus nidulans ,World Wide Web ,03 medical and health sciences ,Annotation ,Resource (project management) ,Databases, Genetic ,Genetics ,skin and connective tissue diseases ,030304 developmental biology ,Comparative genomics ,0303 health sciences ,Aspergillus ,biology ,030306 microbiology ,business.industry ,Aspergillus fumigatus ,Molecular Sequence Annotation ,Articles ,biology.organism_classification ,Biotechnology ,Genome, Fungal ,business - Abstract
The Aspergillus Genome Database (AspGD; http://www.aspgd.org) is a freely available, web-based resource for researchers studying fungi of the genus Aspergillus, which includes organisms of clinical, agricultural and industrial importance. AspGD curators have now completed comprehensive review of the entire published literature about Aspergillus nidulans and Aspergillus fumigatus, and this annotation is provided with streamlined, ortholog-based navigation of the multispecies information. AspGD facilitates comparative genomics by providing a full-featured genomics viewer, as well as matched and standardized sets of genomic information for the sequenced aspergilli. AspGD also provides resources to foster interaction and dissemination of community information and resources. We welcome and encourage feedback at aspergillus-curator@lists.stanford.edu.
- Published
- 2011
19. The Candida genome database incorporates multiple Candida species: multispecies search and analysis tools with curated gene and protein information for Candida albicans and Candida glabrata
- Author
-
Farrell Wymore, Gavin Sherlock, Stuart R. Miyasato, Marek S. Skrzypek, Martha B. Arnaud, Jonathan Binkley, Diane O. Inglis, Prachi Shah, Gail Binkley, and Matt Simison
- Subjects
Genes, Fungal ,Locus (genetics) ,Genomics ,Candida glabrata ,Computational biology ,Genome ,Microbiology ,Fungal Proteins ,03 medical and health sciences ,Candida albicans ,Databases, Genetic ,Genetics ,030304 developmental biology ,Candida ,0303 health sciences ,Fungal protein ,biology ,030306 microbiology ,Articles ,biology.organism_classification ,Corpus albicans ,Genome, Fungal ,Software ,Reference genome - Abstract
The Candida Genome Database (CGD, http://www.candidagenome.org/) is an internet-based resource that provides centralized access to genomic sequence data and manually curated functional information about genes and proteins of the fungal pathogen Candida albicans and other Candida species. As the scope of Candida research, and the number of sequenced strains and related species, has grown in recent years, the need for expanded genomic resources has also grown. To answer this need, CGD has expanded beyond storing data solely for C. albicans, now integrating data from multiple species. Herein we describe the incorporation of this multispecies information, which includes curated gene information and the reference sequence for C. glabrata, as well as orthology relationships that interconnect Locus Summary pages, allowing easy navigation between genes of C. albicans and C. glabrata. These orthology relationships are also used to predict GO annotations of their products. We have also added protein information pages that display domains, structural information and physicochemical properties; bibliographic pages highlighting important topic areas in Candida biology; and a laboratory strain lineage page that describes the lineage of commonly used laboratory strains. All of these data are freely available at http://www.candidagenome.org/. We welcome feedback from the research community at candida-curator@lists.stanford.edu.
- Published
- 2011
20. Saccharomyces Genome Database provides mutant phenotype data
- Author
-
Marek S. Skrzypek, Gail Binkley, Michael S. Livstone, Rose Oughtred, Shuai Weng, Stuart R. Miyasato, David Botstein, Eurie L. Hong, Rama Balakrishnan, Jodi E. Hirschman, Robert S. Nash, Benjamin C. Hitz, Maria C. Costanzo, Julie Park, Cynthia J. Krieger, Dianna G. Fisk, Stacia R. Engel, Kara Dolinski, Edith D. Wong, Karen R. Christie, Selina S. Dwight, and J. Michael Cherry
- Subjects
Protein domain ,Mutant ,Saccharomyces cerevisiae ,Genes, Fungal ,Information Storage and Retrieval ,Biology ,medicine.disease_cause ,Saccharomyces ,Databases, Genetic ,Genetics ,medicine ,DNA, Fungal ,Databases, Protein ,Mutation ,Internet ,Saccharomyces genome database ,Computational Biology ,Articles ,biology.organism_classification ,Phenotype ,Yeast ,Protein Structure, Tertiary ,Genome, Fungal ,Databases, Nucleic Acid ,Software - Abstract
The Saccharomyces Genome Database (SGD; http:// www.yeastgenome.org) is a scientific database for the molecular biology and genetics of the yeast Saccharomyces cerevisiae, which is commonly known as baker’s or budding yeast. The information in SGD includes functional annotations, mapping and sequence information, protein domains and structure, expression data, mutant phenotypes, physical and genetic interactions and the primary literature from which these data are derived. Here we describe how published phenotypes and genetic interaction data are annotated and displayed in SGD.
- Published
- 2009
21. New tools at the Candida Genome Database: biochemical pathways and full-text literature search
- Author
-
Stuart R. Miyasato, Maria C. Costanzo, Marek S. Skrzypek, Prachi Shah, Martha B. Arnaud, Gail Binkley, Gavin Sherlock, and Diane O. Inglis
- Subjects
Genes, Fungal ,Information Storage and Retrieval ,Context (language use) ,Genome ,Open Reading Frames ,User-Computer Interface ,03 medical and health sciences ,Data sequences ,Candida albicans ,Databases, Genetic ,Genetics ,DNA, Fungal ,Databases, Protein ,Gene ,030304 developmental biology ,Internet ,0303 health sciences ,biology ,030306 microbiology ,Genome database ,Fungal genetics ,Computational Biology ,Articles ,biology.organism_classification ,Protein Structure, Tertiary ,3. Good health ,Genome, Fungal ,Analysis tools ,Databases, Nucleic Acid ,Software - Abstract
The Candida Genome Database (CGD, http://www.candidagenome.org/) provides online access to genomic sequence data and manually curated functional information about genes and proteins of the human pathogen Candida albicans. Herein, we describe two recently added features, Candida Biochemical Pathways and the Textpresso full-text literature search tool. The Biochemical Pathways tool provides visualization of metabolic pathways and analysis tools that facilitate interpretation of experimental data, including results of large-scale experiments, in the context of Candida metabolism. Textpresso for Candida allows searching through the full-text of Candida-specific literature, including clinical and epidemiological studies.
- Published
- 2009
22. The Aspergillus Genome Database, a curated comparative genomics resource for gene, protein and sequence information for the Aspergillus research community
- Author
-
Maria C. Costanzo, Gail Binkley, Marek S. Skrzypek, Stuart R. Miyasato, Martha B. Arnaud, Joshua Orvis, Jonathan Crabtree, Diane O. Inglis, Gavin Sherlock, Marcus C. Chibucos, Prachi Shah, Jennifer R. Wortman, and Adil Lotia
- Subjects
Genes, Fungal ,Information Storage and Retrieval ,Genomics ,Scientific literature ,Computational biology ,Genome ,Aspergillus nidulans ,Fungal Proteins ,03 medical and health sciences ,Resource (project management) ,Databases, Genetic ,Genetics ,Databases, Protein ,Gene ,030304 developmental biology ,Comparative genomics ,Internet ,0303 health sciences ,Aspergillus ,Fungal protein ,Models, Genetic ,biology ,030306 microbiology ,Computational Biology ,Articles ,biology.organism_classification ,Protein Structure, Tertiary ,Phenotype ,Genome, Fungal ,Databases, Nucleic Acid ,Software - Abstract
The Aspergillus Genome Database (AspGD) is an online genomics resource for researchers studying the genetics and molecular biology of the Aspergilli. AspGD combines high-quality manual curation of the experimental scientific literature examining the genetics and molecular biology of Aspergilli, cutting-edge comparative genomics approaches to iteratively refine and improve structural gene annotations across multiple Aspergillus species, and web-based research tools for accessing and exploring the data. All of these data are freely available at http://www.aspgd.org. We welcome feedback from users and the research community at aspergillus-curator@genome.stanford.edu.
- Published
- 2009
23. The Gene Ontology project in 2008
- Author
-
John Day Richter, Rex L. Chisholm, Carol J. Bult, Petra Fey, Michael S. Livstone, Susan Bromberg, Evelyn Camon, Suzanna E. Lewis, Janan T. Eppig, Emily Dimmer, Mary Shimoyama, Ni Li, Rose Oughtred, Rolf Apweiler, Stuart R. Miyasato, Edith D. Wong, Tanya Z. Berardini, Maria C. Costanzo, Christopher J. Mungall, David P. Hill, Ruth C. Lovering, Valerie Wood, Marek S. Skrzypek, Jodi E. Hirschman, J. Michael Cherry, Li Donghui, Seth Carbon, Jennifer R. Wortman, Kara Dolinski, Giorgio Valle, Kathy K. Zhu, Susan Tweedie, Shane C. Burgess, Stacia R. Engel, Trudy Torto Alalibo, Paul W. Sternberg, Fiona M. McCarthy, Pankaj Jaiswal, Doug Howe, Ranjana Kishore, Jennifer I. Deegan, Warren A. Kibbe, Gail Binkley, Simon N. Twigger, Harold J. Drabkin, Erika Feltrin, Martin Aslett, Qing Dong, Matthew Berriman, David Botstein, Victoria Petri, Pascale Gaudet, Candace Collmer, Shuai Weng, Cynthia J. Krieger, Linda Hannick, Dianna G. Fisk, Robert S. Nash, Rachael P. Huntley, Nicola Mulder, Jennifer L. Smith, Sue Povey, Seung Y. Rhee, Stan Laulederkind, Benjamin C. Hitz, Julie Park, Howard J. Jacob, Midori A. Harris, Michelle G. Giglio, Judith A. Blake, Martin Ringwald, Erich M. Schwarz, Daniel Barrell, Rama Balakrishnan, Alexander D. Diehl, Trent E. Seigfried, Amelia Ireland, Eurie L. Hong, Jane Lomax, Karen Eilbeck, Michael Ashburner, Karen R. Christie, Kimberly Van Auken, Mary E. Dolan, Varsha K. Khodiyar, and Monte Westerfield
- Subjects
Interface (Java) ,Genomics ,Biology ,Bioinformatics ,Vocabulary ,World Wide Web ,Open Biomedical Ontologies ,Databases ,03 medical and health sciences ,Annotation ,Mice ,User-Computer Interface ,0302 clinical medicine ,Resource (project management) ,Genetic ,Controlled vocabulary ,Databases, Genetic ,Genetics ,Animals ,Humans ,Sequence Ontology ,GeneralLiterature_REFERENCE(e.g.,dictionaries,encyclopedias,glossaries) ,030304 developmental biology ,0303 health sciences ,Internet ,business.industry ,Articles ,Rats ,Sequence Analysis ,Vocabulary, Controlled ,030220 oncology & carcinogenesis ,The Internet ,ComputingMethodologies_GENERAL ,Controlled ,business ,Caltech Library Services - Abstract
The Gene Ontology (GO) project (http://www.geneontology.org/) provides a set of structured, controlled vocabularies for community use in annotating genes, gene products and sequences (also see http://www.sequenceontology.org/). The ontologies have been extended and refined for several biological areas, and improvements to the structure of the ontologies have been implemented. To improve the quantity and quality of gene product annotations available from its public repository, the GO Consortium has launched a focused effort to provide comprehensive and detailed annotation of orthologous genes across a number of ‘reference’ genomes, including human and several key model organisms. Software developments include two releases of the ontology-editing tool OBO-Edit, and improvements to the AmiGO browser interface.
- Published
- 2007
24. Expanded protein information at SGD: new pages and proteome browser
- Author
-
Rama Balakrishnan, Chandra L. Theesfeld, Robert S. Nash, Maria C. Costanzo, J. Michael Cherry, Kara Dolinski, Marek S. Skrzypek, Eurie L. Hong, Mark Schroeder, David Botstein, Shuai Weng, Michael S. Livstone, Stacia R. Engel, Selina S. Dwight, Christopher Lane, Gail Binkley, Benjamin C. Hitz, Julie Park, Stuart R. Miyasato, Jodi E. Hirschman, Karen R. Christie, Anand Sethuraman, Dianna G. Fisk, Qing Dong, and Rose Oughtred
- Subjects
Proteomics ,Internet ,Saccharomyces cerevisiae Proteins ,Information retrieval ,Protein family ,business.industry ,Saccharomyces cerevisiae ,Articles ,Biology ,Bioinformatics ,Visualization ,User-Computer Interface ,ComputingMethodologies_PATTERNRECOGNITION ,Protein Annotation ,Sequence Analysis, Protein ,Web page ,Proteome ,Computer Graphics ,Genetics ,The Internet ,Genome, Fungal ,Databases, Protein ,business ,Hidden Markov model - Abstract
The recent explosion in protein data generated from both directed small-scale studies and large-scale proteomics efforts has greatly expanded the quantity of available protein information and has prompted the Saccharomyces Genome Database (SGD; http://www.yeastgenome.org/) to enhance the depth and accessibility of protein annotations. In particular, we have expanded ongoing efforts to improve the integration of experimental information and sequence-based predictions and have redesigned the protein information web pages. A key feature of this redesign is the development of a GBrowse-derived interactive Proteome Browser customized to improve the visualization of sequence-based protein information. This Proteome Browser has enabled SGD to unify the display of hidden Markov model (HMM) domains, protein family HMMs, motifs, transmembrane regions, signal peptides, hydropathy plots and profile hits using several popular prediction algorithms. In addition, a physico-chemical properties page has been introduced to provide easy access to basic protein information. Improvements to the layout of the Protein Information page and integration of the Proteome Browser will facilitate the ongoing expansion of sequence-specific experimental information captured in SGD, including post-translational modifications and other user-defined annotations. Finally, SGD continues to improve upon the availability of genetic and physical interaction data in an ongoing collaboration with BioGRID by providing direct access to more than 82,000 manually-curated interactions.
- Published
- 2007
25. Ontology application and use at the ENCODE DCC
- Author
-
Marcus Ho, Stuart R. Miyasato, W. James Kent, J. Seth Strattan, Jean M. Davidson, Nikhil R. Podduturi, Cricket A. Sloan, Greg Roe, Eurie L. Hong, Laurence D. Rowe, Brian T. Lee, Esther T. Chan, J. Michael Cherry, Drew T. Erickson, Forrest Y. Tanaka, Benjamin C. Hitz, Venkat S. Malladi, and Matt Simison
- Subjects
Information retrieval ,Transcription, Genetic ,Standardization ,Computer science ,Experimental data ,Molecular Sequence Annotation ,Ontology (information science) ,ENCODE ,General Biochemistry, Genetics and Molecular Biology ,Set (abstract data type) ,World Wide Web ,Metadata ,Mice ,Gene Ontology ,Databases, Genetic ,Encyclopedia ,Animals ,Humans ,Original Article ,Gene Regulatory Networks ,General Agricultural and Biological Sciences ,Data Curation ,Information Systems - Abstract
The Encyclopedia of DNA elements (ENCODE) project is an ongoing collaborative effort to create a catalog of genomic annotations. To date, the project has generated over 4000 experiments across more than 350 cell lines and tissues using a wide array of experimental techniques to study the chromatin structure, regulatory network and transcriptional landscape of the Homo sapiens and Mus musculus genomes. All ENCODE experimental data, metadata and associated computational analyses are submitted to the ENCODE Data Coordination Center (DCC) for validation, tracking, storage and distribution to community resources and the scientific community. As the volume of data increases, the organization of experimental details becomes increasingly complicated and demands careful curation to identify related experiments. Here, we describe the ENCODE DCC’s use of ontologies to standardize experimental metadata. We discuss how ontologies, when used to annotate metadata, provide improved searching capabilities and facilitate the ability to find connections within a set of experiments. Additionally, we provide examples of how ontologies are used to annotate ENCODE metadata and how the annotations can be identified via ontology-driven searches at the ENCODE portal. As genomic datasets grow larger and more interconnected, standardization of metadata becomes increasingly vital to allow for exploration and comparison of data between different scientific projects. Database URL: https://www.encodeproject.org/
- Published
- 2015
26. Sequence resources at the Candida Genome Database
- Author
-
Stuart R. Miyasato, Martha B. Arnaud, Gail Binkley, Marek S. Skrzypek, Gavin Sherlock, Maria C. Costanzo, Christopher Lane, and Prachi Shah
- Subjects
congenital, hereditary, and neonatal diseases and abnormalities ,Genes, Fungal ,Sequence assembly ,Sequence Homology ,Genomics ,Computational biology ,Biology ,Genome ,Fungal Proteins ,03 medical and health sciences ,User-Computer Interface ,hemic and lymphatic diseases ,Candida albicans ,Databases, Genetic ,Genetics ,DNA, Fungal ,Gene ,030304 developmental biology ,Sequence (medicine) ,Whole genome sequencing ,0303 health sciences ,Fungal protein ,Internet ,030306 microbiology ,Genome database ,Articles ,Genome, Fungal - Abstract
The Candida Genome Database (CGD, http://www.candidagenome.org/) contains a curated collection of genomic information and community resources for researchers who are interested in the molecular biology of the opportunistic pathogen Candida albicans. With the recent release of a new assembly of the C.albicans genome, Assembly 20, C.albicans genomics has entered a new era. Although the C.albicans genome assembly continues to undergo refinement, multiple assemblies and gene nomenclatures will remain in widespread use by the research community. CGD has now taken on the responsibility of maintaining the most up-to-date version of the genome sequence by providing the data from this new assembly alongside the data from the previous assemblies, as well as any future corrections and refinements. In this database update, we describe the sequence information available for C.albicans, the sequence information contained in CGD, and the tools for sequence retrieval, analysis and comparison that CGD provides. CGD is freely accessible at http://www.candidagenome.org/ and CGD curators may be contacted by email at ude.drofnats.emoneg@rotaruc-adidnac.
- Published
- 2006
27. The Candida Genome Database (CGD), a community resource for Candida albicans gene and protein information
- Author
-
Martha B. Arnaud, Gail Binkley, Marek S. Skrzypek, Maria C. Costanzo, Stuart R. Miyasato, Gavin Sherlock, and Christopher Lane
- Subjects
Genomics ,Context (language use) ,Computational biology ,Biology ,Bioinformatics ,Genome ,Fungal Proteins ,Gene product ,User-Computer Interface ,03 medical and health sciences ,hemic and lymphatic diseases ,Candida albicans ,Databases, Genetic ,Genetics ,Gene ,030304 developmental biology ,Internet ,0303 health sciences ,Fungal protein ,030306 microbiology ,Articles ,biology.organism_classification ,Systems Integration ,Gene nomenclature ,Genome, Fungal - Abstract
The Candida Genome Database (CGD) is a new database that contains genomic information about the opportunistic fungal pathogen Candida albicans. CGD is a public resource for the research community that is interested in the molecular biology of this fungus. CGD curators are in the process of combing the scientific literature to collect all C.albicans gene names and aliases; to assign gene ontology terms that describe the molecular function, biological process, and subcellular localization of each gene product; to annotate mutant phenotypes; and to summarize the function and biological context of each gene product in free-text description lines. CGD also provides community resources, including a reservation system for gene names and a colleague registry through which Candida researchers can share contact information and research interests. CGD is publicly funded (by NIH grant R01 DE15873-01 from the NIDCR) and is freely available at http://www.candidagenome.org/.
- Published
- 2004
28. Gene Ontology annotations at SGD: new data sources and annotation methods
- Author
-
Stuart R. Miyasato, Rama Balakrishnan, Shuai Weng, Dianna G. Fisk, Eurie L. Hong, David Botstein, Robert S. Nash, Jodi E. Hirschman, Marek S. Skrzypek, Edith D. Wong, Selina S. Dwight, Michael S. Livstone, Stacia R. Engel, Kathy K. Zhu, J. Michael Cherry, Benjamin C. Hitz, Rose Oughtred, Julie Park, Kara Dolinski, Gail Binkley, Karen R. Christie, Cynthia J. Krieger, Maria C. Costanzo, and Qing Dong
- Subjects
Genetics ,Data source ,Internet ,Information retrieval ,Saccharomyces cerevisiae Proteins ,Gene ontology ,Genes, Fungal ,Computational Biology ,Genomics ,Saccharomyces cerevisiae ,Articles ,Biology ,Genome ,Annotation ,User-Computer Interface ,Vocabulary, Controlled ,Controlled vocabulary ,Databases, Genetic ,UniProt ,Experimental methods ,Genome, Fungal ,GeneralLiterature_REFERENCE(e.g.,dictionaries,encyclopedias,glossaries) - Abstract
The Saccharomyces Genome Database (SGD; http:// www.yeastgenome.org/) collects and organizes biological information about the chromosomal features and gene products of the budding yeast Saccharomyces cerevisiae. Although published data from traditional experimental methods are the primary sources of evidence supporting Gene Ontology (GO) annotations for a gene product, high-throughput experiments and computational predictions can also provide valuable insights in the absence of an extensive body of literature. Therefore, GO annotations available at SGD now include high-throughput data as well as computational predictions provided by the GO Annotation Project (GOA UniProt; http://www.ebi.ac.uk/GOA/). Because the annotation method used to assign GO annotations varies by data source, GO resources at SGD have been modified to distinguish data sources and annotation methods. In addition to providing information for genes that have not been experimentally characterized, GO annotations from independent sources can be compared to those made by SGD to help keep the literature-based GO annotations current.
- Published
- 2007
29. The Candida Genome Database: facilitating research on Candida albicans molecular biology
- Author
-
Stuart R. Miyasato, Martha B. Arnaud, Gavin Sherlock, Marek S. Skrzypek, Maria C. Costanzo, Christopher Lane, and Gail Binkley
- Subjects
Base Sequence ,Extramural ,Genome database ,General Medicine ,Biology ,biology.organism_classification ,Applied Microbiology and Biotechnology ,Microbiology ,Genome ,Molecular biology ,Corpus albicans ,Annotation ,Biological literature ,hemic and lymphatic diseases ,Research community ,Terminology as Topic ,Databases, Genetic ,natural sciences ,Registries ,Genome, Fungal ,Candida albicans ,Alleles ,Candida - Abstract
The Candida Genome Database (CGD; http://www.candidagenome.org) is a resource for information about the Candida albicans genomic sequence and the molecular biology of its encoded gene products. CGD collects and organizes data from the biological literature concerning C. albicans, and provides tools for viewing, searching, analysing, and downloading these data. CGD also serves as an organizing centre for the C. albicans research community, providing a gene-name registry, contact information, and research community news. This article describes the information contained in CGD and how to access it, either from the perspective of a bench scientist interested in the function of one or a few genes, or from the perspective of a biologist or bioinformatician interpreting large-scale functional genomic datasets.
- Published
- 2006
30. Genome Snapshot: a new resource at the Saccharomyces Genome Database (SGD) presenting an overview of the Saccharomyces cerevisiae genome
- Author
-
Stuart R. Miyasato, David Botstein, Anand Sethuraman, Mayank K. Thanawala, Shuai Weng, Chandra L. Theesfeld, Kara Dolinski, J. Michael Cherry, Dianna G. Fisk, Mark Schroeder, Qing Dong, Rose Oughtred, Michael S. Livstone, Julie Park, Karen R. Christie, Eurie L. Hong, Marek S. Skrzypek, Selina S. Dwight, Stacia R. Engel, Rama Balakrishnan, Rey Andrada, Maria C. Costanzo, Gail Binkley, Jennifer M. Williams, Barry Starr, Christopher Lane, Jodi E. Hirschman, and Robert S. Nash
- Subjects
Saccharomyces cerevisiae Proteins ,Genomics ,Saccharomyces cerevisiae ,ENCODE ,Genome ,Saccharomyces ,Article ,03 medical and health sciences ,Annotation ,User-Computer Interface ,0302 clinical medicine ,Databases, Genetic ,Genetics ,Computer Graphics ,natural sciences ,030304 developmental biology ,0303 health sciences ,Internet ,biology ,Genome project ,biology.organism_classification ,ComputingMethodologies_PATTERNRECOGNITION ,Snapshot (computer storage) ,ComputingMethodologies_GENERAL ,Chromosomes, Fungal ,Genome, Fungal ,030217 neurology & neurosurgery ,Reference genome - Abstract
Sequencing and annotation of the entire Saccharomyces cerevisiae genome has made it pos- sible to gain a genome-wide perspective on yeast genes and gene products. To make this information available on an ongoing basis, the Saccharomyces 20Genome Database (SGD) (http://www.yeastgenome. org/) has created the Genome Snapshot (http://db. yeastgenome.org/cgi-bin/genomeSnapShot.pl). The Genome Snapshot summarizes the current state of knowledge about the genes and chromosomal fea- 25tures of S.cerevisiae. The information is organized into two categories: (i) number of each type of chro- mosomal feature annotated in the genome and (ii) number and distribution of genes annotated to Gene Ontology terms. Detailed lists are accessible 30through SGD's Advanced Search tool (http://db. yeastgenome.org/cgi-bin/search/featureSearch), and all the data presented on this page are available from the SGD ftp site (ftp://ftp.yeastgenome.org/ yeast/).
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.