28 results on '"Juancarlos Chan"'
Search Results
2. Alliance of Genome Resources Portal: unified model organism research platform
- Author
-
Adam Wright, Paul W. Sternberg, Daniela Raciti, Monika Tutaj, Josh Goodman, Ken Frazer, Paul Thomas, Scott Cain, Raymond Lee, Judith A. Blake, Patrick Kalita, Ajay Shrivatsav, Julie Agapite, Marek S. Skrzypek, Hans-Michael Mueller, Wen J. Chen, Karen Yook, Gillian Millburn, Joanna Argasinska, David Fashena, Kevin Schaper, Joel E. Richardson, Douglas G. Howe, Barbara Dunn, Yvonne M. Bradford, Nathan Dunn, Jaehyoung Cho, Ranjana Kishore, Kalpana Karra, Sabrina Toro, Anne E. Eagle, Norbert Perrimon, Anushya Muruganujan, Beverley B. Matthews, Christian A. Grove, Edith D. Wong, Monte Westerfield, Olin Blodgett, Gary Williams, Jose-Maria Urbano, Marie-Claire Harrison, Steven J Marygold, Tremayne Mushayahama, Marek Tutaj, Susan Russo Gelbart, Jennifer R. Smith, Felix Gondwe, Dustin Ebert, Juancarlos Chan, J. Michael Cherry, Ceri E. Van Slyke, Christopher J. Tabone, L. Sian Gramates, Madeline A. Crosby, Robert S. Nash, Kevin A. MacPherson, Patrick Ng, Christian Pich, Suzi Aleksander, Monika Tomczuk, Brian R. Calvi, Todd W. Harris, Cynthia L. Smith, Stan Laulederkind, Jyothi Thota, Gilberto dos Santos, Matt Simison, Kimberly Van Auken, Mary E. Dolan, Karen R. Christie, Stacia R. Engel, Leyla Ruzicka, Carol J. Bult, Kevin L. Howe, Stuart R. Miyasato, Shur-Jen Wang, David R. Shaw, Mary Shimoyama, Valerio Arnaboldi, Matthew Russell, Michael Paulini, Sibyl Gao, Sagar Jha, Jeff De Pons, Christopher J. Mungall, Seth Carbon, James A. Kadin, Sierra A. T. Moxon, Susan M. Bello, Thomas C. Kaufman, Laurent-Philippe Albou, Shuai Weng, and Helen Attrill
- Subjects
NAR Breakthrough Article ,Saccharomyces cerevisiae ,Biology ,Genome ,Data modeling ,Mice ,User-Computer Interface ,03 medical and health sciences ,0302 clinical medicine ,Resource (project management) ,Databases, Genetic ,Genetics ,Animals ,Humans ,Caenorhabditis elegans ,Alleles ,Zebrafish ,Organism ,030304 developmental biology ,Internet ,0303 health sciences ,Genome, Human ,Computational Biology ,Genomics ,Data science ,Rats ,Variety (cybernetics) ,Drosophila melanogaster ,Gene Ontology ,Data access ,Alliance ,Workflow ,Software ,030217 neurology & neurosurgery - Abstract
The Alliance of Genome Resources (Alliance) is a consortium of the major model organism databases and the Gene Ontology that is guided by the vision of facilitating exploration of related genes in human and well-studied model organisms by providing a highly integrated and comprehensive platform that enables researchers to leverage the extensive body of genetic and genomic studies in these organisms. Initiated in 2016, the Alliance is building a central portal (www.alliancegenome.org) for access to data for the primary model organisms along with gene ontology data and human data. All data types represented in the Alliance portal (e.g. genomic data and phenotype descriptions) have common data models and workflows for curation. All data are open and freely available via a variety of mechanisms. Long-term plans for the Alliance project include a focus on coverage of additional model organisms including those without dedicated curation communities, and the inclusion of new data types with a particular focus on providing data and tools for the non-model-organism researcher that support enhanced discovery about human health and disease. Here we review current progress and present immediate plans for this new bioinformatics resource.
- Published
- 2019
3. Text mining meets community curation: a newly designed curation platform to improve author experience and participation at WormBase
- Author
-
Paul W. Sternberg, Juancarlos Chan, Kimberly Van Auken, Valerio Arnaboldi, Daniela Raciti, and Hans-Michael Müller
- Subjects
Support Vector Machine ,Databases, Factual ,Computer science ,Process (engineering) ,Knowledge Bases ,media_common.quotation_subject ,Data type ,General Biochemistry, Genetics and Molecular Biology ,Data modeling ,User-Computer Interface ,03 medical and health sciences ,Animals ,Data Mining ,Quality (business) ,Caenorhabditis elegans ,Data Curation ,030304 developmental biology ,media_common ,Internet ,0303 health sciences ,Information retrieval ,030302 biochemistry & molecular biology ,Pipeline (software) ,Support vector machine ,Original Article ,WormBase ,User interface ,General Agricultural and Biological Sciences ,Information Systems - Abstract
Biological knowledgebases rely on expert biocuration of the research literature to maintain up-to-date collections of data organized in machine-readable form. To enter information into knowledgebases, curators need to follow three steps: (i) identify papers containing relevant data, a process called triaging; (ii) recognize named entities; and (iii) extract and curate data in accordance with the underlying data models. WormBase (WB), the authoritative repository for research data on Caenorhabditis elegans and other nematodes, uses text mining (TM) to semi-automate its curation pipeline. In addition, WB engages its community, via an Author First Pass (AFP) system, to help recognize entities and classify data types in their recently published papers. In this paper, we present a new WB AFP system that combines TM and AFP into a single application to enhance community curation. The system employs string-searching algorithms and statistical methods (e.g. support vector machines (SVMs)) to extract biological entities and classify data types, and it presents the results to authors in a web form where they validate the extracted information, rather than enter it de novo as the previous form required. With this new system, we lessen the burden for authors, while at the same time receive valuable feedback on the performance of our TM tools. The new user interface also links out to specific structured data submission forms, e.g. for phenotype or expression pattern data, giving the authors the opportunity to contribute a more detailed curation that can be incorporated into WB with minimal curator review. Our approach is generalizable and could be applied to additional knowledgebases that would like to engage their user community in assisting with the curation. In the five months succeeding the launch of the new system, the response rate has been comparable with that of the previous AFP version, but the quality and quantity of the data received has greatly improved.
- Published
- 2020
4. Automated generation of gene summaries at the Alliance of Genome Resources
- Author
-
the Alliance of Genome Resources, Paul W Sternberg, Mary Shimoyama, Stacia R Engel, Mary E Dolan, Jose M Urbano, Robert S Nash, Juancarlos Chan, Ceri E Van Slyke, Valerio Arnaboldi, and Ranjana Kishore
- Subjects
Information retrieval ,Computer science ,Information Storage and Retrieval ,Molecular Sequence Annotation ,Genomics ,Gene Annotation ,Ontology (information science) ,General Biochemistry, Genetics and Molecular Biology ,Discoverability ,Set (abstract data type) ,Gene Ontology ,ComputingMethodologies_PATTERNRECOGNITION ,Disease Ontology ,Databases, Genetic ,Original Article ,ComputingMethodologies_GENERAL ,General Agricultural and Biological Sciences ,Sentence ,Natural language ,Information Systems - Abstract
Short paragraphs that describe gene function, referred to as gene summaries, are valued by users of biological knowledgebases for the ease with which they convey key aspects of gene function. Manual curation of gene summaries, while desirable, is difficult for knowledgebases to sustain. We developed an algorithm that uses curated, structured gene data at the Alliance of Genome Resources (Alliance; www.alliancegenome.org) to automatically generate gene summaries that simulate natural language. The gene data used for this purpose include curated associations (annotations) to ontology terms from the Gene Ontology, Disease Ontology, model organism knowledgebase (MOK)-specific anatomy ontologies and Alliance orthology data. The method uses sentence templates for each data category included in the gene summary in order to build a natural language sentence from the list of terms associated with each gene. To improve readability of the summaries when numerous gene annotations are present, we developed a new algorithm that traverses ontology graphs in order to group terms by their common ancestors. The algorithm optimizes the coverage of the initial set of terms and limits the length of the final summary, using measures of information content of each ontology term as a criterion for inclusion in the summary. The automated gene summaries are generated with each Alliance release, ensuring that they reflect current data at the Alliance. Our method effectively leverages category-specific curation efforts of the Alliance member databases to create modular, structured and standardized gene summaries for seven member species of the Alliance. These automatically generated gene summaries make cross-species gene function comparisons tenable and increase discoverability of potential models of human disease. In addition to being displayed on Alliance gene pages, these summaries are also included on several MOK gene pages.
- Published
- 2020
5. WormBase: a modern Model Organism Information Resource
- Author
-
Christian A. Grove, Faye H. Rodgers, Gary Williams, Wen J. Chen, Jaehyoung Cho, Paul W. Sternberg, Karen Yook, Valerio Arnaboldi, Matthew Russell, Sibyl Gao, Lincoln Stein, Raymond Lee, Qinghua Wang, Juancarlos Chan, Paul Davis, Paulo A. S. Nuin, Daniela Raciti, Michael Paulini, Adam Wright, Kevin L. Howe, Gary Schindelman, Hans-Michael Müller, Scott Cain, Tim Schedl, Cecilia Nakamura, Ranjana Kishore, Kimberly Van Auken, and Todd W. Harris
- Subjects
Interface (Java) ,Cloud computing ,Biology ,World Wide Web ,03 medical and health sciences ,User-Computer Interface ,0302 clinical medicine ,Databases, Genetic ,Genetics ,Animals ,Data Mining ,Database Issue ,Architecture ,Caenorhabditis elegans ,Genes, Helminth ,030304 developmental biology ,0303 health sciences ,Internet ,business.industry ,Genomics ,biology.organism_classification ,Workflow ,The Internet ,WormBase ,User interface ,business ,030217 neurology & neurosurgery - Abstract
WormBase (https://wormbase.org/) is a mature Model Organism Information Resource supporting researchers using the nematode Caenorhabditis elegans as a model system for studies across a broad range of basic biological processes. Toward this mission, WormBase efforts are arranged in three primary facets: curation, user interface and architecture. In this update, we describe progress in each of these three areas. In particular, we discuss the status of literature curation and recently added data, detail new features of the web interface and options for users wishing to conduct data mining workflows, and discuss our efforts to build a robust and scalable architecture by leveraging commercial cloud offerings. We conclude with a description of WormBase's role as a founding member of the nascent Alliance of Genome Resources.
- Published
- 2019
6. WormBase 2017: molting into a new stage
- Author
-
Mary Ann Tuli, Todd W. Harris, Ranjana Kishore, Paul Davis, Raymond Lee, Lincoln Stein, Tim Schedl, Kevin L. Howe, Paul W. Sternberg, Adam Wright, Gary Williams, Qinghua Wang, Faye H. Rodgers, Gary Schindelman, Matthew Berriman, Daniela Raciti, Hans-Michael Müller, Christian A. Grove, Sibyl Gao, Juancarlos Chan, Paulo A. S. Nuin, Valerio Arnaboldi, Matthew Russell, Michael Paulini, Cecilia Nakamura, Scott Cain, Kimberly Van Auken, Wen J. Chen, Karen Yook, and Paul J. Kersey
- Subjects
0301 basic medicine ,Nematoda ,Process (engineering) ,Datasets as Topic ,Information Storage and Retrieval ,Biology ,Ontology (information science) ,Web Browser ,Bioinformatics ,Genome ,World Wide Web ,03 medical and health sciences ,User-Computer Interface ,0302 clinical medicine ,Databases, Genetic ,Genetics ,Database Issue ,Animals ,Data Mining ,Humans ,Caenorhabditis elegans ,Data Curation ,Publishing ,Data curation ,biology.organism_classification ,Disease Models, Animal ,030104 developmental biology ,Gene Ontology ,Platyhelminths ,Scalability ,Caenorhabditis ,RNA Interference ,WormBase ,Sequence Alignment ,030217 neurology & neurosurgery ,Forecasting - Abstract
WormBase (http://www.wormbase.org) is an important knowledge resource for biomedical researchers worldwide. To accommodate the ever increasing amount and complexity of research data, WormBase continues to advance its practices on data acquisition, curation and retrieval to most effectively deliver comprehensive knowledge about Caenorhabditis elegans, and genomic information about other nematodes and parasitic flatworms. Recent notable enhancements include user-directed submission of data, such as micropublication; genomic data curation and presentation, including additional genomes and JBrowse, respectively; new query tools, such as SimpleMine, Gene Enrichment Analysis; new data displays, such as the Person Lineage browser and the Summary of Ontology-based Annotations. Anticipating more rapid data growth ahead, WormBase continues the process of migrating to a cutting-edge database technology to achieve better stability, scalability, reproducibility and a faster response time. To better serve the broader research community, WormBase, with five other Model Organism Databases and The Gene Ontology project, have begun to collaborate formally as the Alliance of Genome Resources.
- Published
- 2018
7. The Gene Ontology Resource: 20 years and still GOing strong
- Author
-
Rebecca Tauber, Robert J. Dodson, Marek S. Skrzypek, Raymond Lee, Valerie Wood, Paul W. Sternberg, C. Rivoire, Nancy H. Campbell, E. Hatton-Ellis, M. Rodriguez-Lopez, Elena Speretta, D. S. Osumi, Alix J. Rey, A. Mac-Dougall, Jane E. Mendel, Christopher J. Mungall, Helen Parkinson, Maria Jesus Martin, Pascale Gaudet, A. Stutz, Nathan Dunn, Gillian Millburn, Kate Warner, K. Axelsen, C. Arighi, Mary E. Dolan, M. J. Kesling, Barbara Kramarz, Seth Carbon, Joshua L. Goodman, Rachael P. Huntley, Anjali Shrivastava, Daniela Raciti, C. Wu, Victor B. Strelets, Steven J Marygold, H. Drabkin, M. Magrane, Benjamin M. Good, A. Shrivatsav Vp, Lorna Richardson, James P. Balhoff, P. Lemercier, E. Bakker, Amaia Sangrador-Vegas, Marc Feuermann, Paul Thomas, D. Lieberherr, J. Cho, Hans-Michael Müller, Robert S. Nash, Leonore Reiser, Birgit H M Meldal, Neil D. Rawlings, N. N. Hyka, D. A. Natale, Paola Roncaglia, Paul Denny, Michelle G. Giglio, Judith A. Blake, S. Sundaram, Shankar Subramaniam, Marcus C. Chibucos, Kevin A. MacPherson, S. Poux, Karen R. Christie, Mary Shimoyama, Eva Huala, Colin Logie, Huaiyu Mi, Felix Gondwe, K. Pichler, Petra Fey, Deborah A. Siegele, Phani V. Garapati, N. Tyagi, J L De Pons, Alex Bateman, Melinda R. Dwinell, Pablo Porras, Giulia Antonazzo, Midori A. Harris, Y. Lussi, Stuart R. Miyasato, Li Ni, K. Laiho, A. Estreicher, Travis K. Sheppard, Edith D. Wong, M. C. Harrison, H. Chen, S. Basu, Sandra A. LaBonte, Margaret Duesbury, E. Hartline, Sibyl Gao, Vítor Trovisco, Jacqueline Hayles, George Georghiou, Rex L. Chisholm, Kathleen Falls, S. Poudel, James C. Hu, G. T. Hayman, Kim Rutherford, F. Jungo, Hsin-Yu Chang, E. Boutet, Robert D. Finn, Alex L. Mitchell, Stan Laulederkind, J. H. Rawson, Marek Tutaj, Vanessa Acquaah, Peter D'Eustachio, G. Keller, L. Breuza, P. Garmiri, Nicholas H. Brown, Laurent-Philippe Albou, Antonia Lock, Nomi L. Harris, U. Hinz, Matthew Berriman, R. Britto, Rossana Zaru, Suzanna E. Lewis, N. Gruaz-Gumowski, Livia Perfetto, Matt Simison, Martin Kuiper, Shuai Weng, M. Tognolli, G. Dos Santos, Elizabeth R Bolton, Xiaosong Huang, A. Gos, P. Masson, David B. Emmert, Lisa Matthews, C. Casals-Casas, Kevin L. Howe, N. T. Del, Sandra Orchard, L. Famiglietti, Doug Howe, T. Sawford, T. E.M. Jones, Stephen G. Oliver, Kalpana Karra, S. Fexova, Tremayne Mushayahama, Dustin Ebert, Jim Thurmond, Ruth C. Lovering, E. Coudert, A. Bridge, Suzi Aleksander, Suvarna Nadendla, Christian A. Grove, David P. Hill, J. M. Cherry, M. C. Blatter, K. Van Auken, H. Bye-A-Jee, B. L. Dunn, A. Lreid, Sabrina Toro, Monte Westerfield, Z. Xie, A. Auchincloss, I. Pedruzzi, Anushya Muruganujan, B. Bely, S. H. Ahmad, Stacia R. Engel, Shur-Jen Wang, Gail Binkley, Lincoln Stein, Pinglei Zhou, G. P. Argoud, Marcio Luis Acencio, C. Hulo, Jürg Bähler, Juancarlos Chan, P. C. Ng, Helen Attrill, Mélanie Courtot, A. Ignatchenko, Tanya Z. Berardini, D. Sitnikov, Eric Douglass, and A. Shypitsyna
- Subjects
Quality Control ,media_common.quotation_subject ,Ontology (information science) ,Biology ,History, 21st Century ,Filter (software) ,Unique identifier ,World Wide Web ,03 medical and health sciences ,0302 clinical medicine ,Resource (project management) ,Web page ,Genetics ,Animals ,Humans ,Database Issue ,Quality (business) ,Function (engineering) ,Molecular Biology ,030304 developmental biology ,media_common ,0303 health sciences ,Focus (computing) ,Bacteria ,Eukaryota ,Molecular Sequence Annotation ,History, 20th Century ,High-Throughput Screening Assays ,Gene Ontology ,Mitogen-Activated Protein Kinases ,030217 neurology & neurosurgery - Abstract
The Gene Ontology resource (GO; http://geneontology.org) provides structured, computable knowledge regarding the functions of genes and gene products. Founded in 1998, GO has become widely adopted in the life sciences, and its contents are under continual improvement, both in quantity and in quality. Here, we report the major developments of the GO resource during the past two years. Each monthly release of the GO resource is now packaged and given a unique identifier (DOI), enabling GO-based analyses on a specific release to be reproduced in the future. The molecular function ontology has been refactored to better represent the overall activities of gene products, with a focus on transcription regulator activities. Quality assurance efforts have been ramped up to address potentially out-of-date or inaccurate annotations. New evidence codes for high-throughput experiments now enable users to filter out annotations obtained from these sources. GO-CAM, a new framework for representing gene function that is more expressive than standard GO annotations, has been released, and users can now explore the growing repository of these models. We also provide the ‘GO ribbon’ widget for visualizing GO annotations to a gene; the widget can be easily embedded in any web page. This is an open access article distributed under the terms of the Creative Commons CC BY license, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
- Published
- 2018
8. WormBase 2016: expanding to enable helminth genomic research
- Author
-
Gary Williams, Daniela Raciti, James Done, Christian A. Grove, Paul J. Kersey, Hans-Michael Müller, Adam Wright, Lincoln Stein, Daniel Wang, Scott Cain, Juancarlos Chan, Tim Schedl, Todd W. Harris, Bruce J. Bolt, Paul Davis, Wen J. Chen, Paulo A. S. Nuin, Xiaodong Wang, Karen Yook, Kevin L. Howe, Jane Lomax, Mary Ann Tuli, Eleanor J Stanley, Paul W. Sternberg, Ranjana Kishore, Raymond Lee, Kimberly Van Auken, Michael Paulini, Thomas A. Down, Cecilia Nakamura, Matthew Berriman, Yuling Li, Gary Schindelman, and Sibyl Gao
- Subjects
0301 basic medicine ,Nematoda ,WormBook ,Genomics ,Computational biology ,Ontology (information science) ,Bioinformatics ,Genome ,03 medical and health sciences ,Databases, Genetic ,Genetics ,Animals ,Database Issue ,Helminths ,Caenorhabditis elegans ,Genes, Helminth ,Genome, Helminth ,biology ,Molecular Sequence Annotation ,biology.organism_classification ,030104 developmental biology ,Platyhelminths ,WormBase ,Software - Abstract
WormBase (www.wormbase.org) is a central repository for research data on the biology, genetics and genomics of Caenorhabditis elegans and other nematodes. The project has evolved from its original remit to collect and integrate all data for a single species, and now extends to numerous nematodes, ranging from evolutionary comparators of C. elegans to parasitic species that threaten plant, animal and human health. Research activity using C. elegans as a model system is as vibrant as ever, and we have created new tools for community curation in response to the ever-increasing volume and complexity of data. To better allow users to navigate their way through these data, we have made a number of improvements to our main website, including new tools for browsing genomic features and ontology annotations. Finally, we have developed a new portal for parasitic worm genomes. WormBase ParaSite (parasite.wormbase.org) contains all publicly available nematode and platyhelminth annotated genome sequences, and is designed specifically to support helminth genomic research.
- Published
- 2015
9. Tissue enrichment analysis for C. elegans genomics
- Author
-
Raymond Lee, Paul W. Sternberg, Juancarlos Chan, and David Angeles-Albores
- Subjects
0301 basic medicine ,High-throughput biology ,Genomics ,RNA-Seq ,Computational biology ,Biology ,Biochemistry ,03 medical and health sciences ,Structural Biology ,Animals ,Caenorhabditis elegans ,Molecular Biology ,Gene ,WormBase ,High throughput biology ,Sequence Analysis, RNA ,Methodology Article ,Gene Expression Profiling ,Applied Mathematics ,food and beverages ,biology.organism_classification ,Computer Science Applications ,Gene expression profiling ,Gene Ontology ,030104 developmental biology ,Anatomy ontology ,RNA-seq ,DNA microarray - Abstract
Background Over the last ten years, there has been explosive development in methods for measuring gene expression. These methods can identify thousands of genes altered between conditions, but understanding these datasets and forming hypotheses based on them remains challenging. One way to analyze these datasets is to associate ontologies (hierarchical, descriptive vocabularies with controlled relations between terms) with genes and to look for enrichment of specific terms. Although Gene Ontology (GO) is available for Caenorhabditis elegans, it does not include anatomical information. Results We have developed a tool for identifying enrichment of C. elegans tissues among gene sets and generated a website GUI where users can access this tool. Since a common drawback to ontology enrichment analyses is its verbosity, we developed a very simple filtering algorithm to reduce the ontology size by an order of magnitude. We adjusted these filters and validated our tool using a set of 30 gold standards from Expression Cluster data in WormBase. We show our tool can even discriminate between embryonic and larval tissues and can even identify tissues down to the single-cell level. We used our tool to identify multiple neuronal tissues that are down-regulated due to pathogen infection in C. elegans. Conclusions Our Tissue Enrichment Analysis (TEA) can be found within WormBase, and can be downloaded using Python’s standard pip installer. It tests a slimmed-down C. elegans tissue ontology for enrichment of specific terms and provides users with a text and graphic representation of the results. Electronic supplementary material The online version of this article (doi:10.1186/s12859-016-1229-9) contains supplementary material, which is available to authorized users.
- Published
- 2016
10. Additional file 1 of Tissue enrichment analysis for C. elegans genomics
- Author
-
Angeles-Albores, David, Lee, Raymond N., Juancarlos Chan, and Sternberg, Paul
- Subjects
GeneralLiterature_INTRODUCTORYANDSURVEY ,ComputingMethodologies_DOCUMENTANDTEXTPROCESSING ,ComputingMethodologies_COMPUTERGRAPHICS - Abstract
TEA Tutorial. Tutorial for users interested in using our software within a python script. (PDF 161 kb)
- Published
- 2016
- Full Text
- View/download PDF
11. The Gene Ontology: enhancements for 2011
- Author
-
P D'Eustachio, Benjamin C. Hitz, Julie Park, Paul Browne, Douglas G. Howe, Cynthia J. Krieger, Kalpana Karra, Stan Laulederkind, Karen R. Christie, Susan Tweedie, Eurie L. Hong, Lydie Bougueleret, Michele Magrane, Cathy R. Gresham, Rolf Apweiler, Lisa Matthews, Dong Li, Philippa J. Talmud, Ioannis Xenarios, J. M. Cherry, Tanya Z. Berardini, Deborah A. Siegele, Rama Balakrishnan, D. Sitnikov, A. Auchinchloss, Selina S. Dwight, Tony Sawford, Paul J. Kersey, Ruth C. Lovering, Ruth Y. Eberhardt, Ursula Hinz, Lakshmi Pillai, Sylvain Poux, Edith D. Wong, Klemens Pichler, Kati Laiho, Malcolm J. Gardner, Stephen G. Oliver, Lionel Breuza, Kara Dolinski, P Lemercier, Kristian B. Axelsen, Midori A. Harris, Adrienne E. Zweifel, H. Drabkin, Guillaume Keller, Marek S. Skrzypek, Daniel M. Staines, Fiona M. McCarthy, Nicholas H. Brown, Mark D. McDowall, Antonia Lock, Mary Shimoyama, Maria C. Costanzo, Teresia Buza, S. Jimenez, Rex L. Chisholm, Paul W. Sternberg, Hui Wang, Nadine Gruaz-Gumowski, Chantal Hulo, Rebecca E. Foulger, Melinda R. Dwinell, Judith A. Blake, Marcus C. Chibucos, B. K. McIntosh, C. D. Amundsen, Jane Lomax, L Famiglietti, Tom Hayman, Michael Tognolli, Eva Huala, James C. Hu, Patrick Masson, Maria Jesus Martin, Benoit Bely, Shuai Weng, Heather C. Wick, E. Dimmer, L. Ni, Catherine Rivoire, Christopher J. Mungall, H. Sehra, P. Duek-Roggli, Maria Victoria Schneider, Dianna G. Fisk, Michael S. Livstone, Ivo Pedruzzi, Shyamala Sundaram, Donna K. Slonim, Isabelle Cusin, Stuart R. Miyasato, Timothy F. Lowry, Varsha K. Khodiyar, Seth Carbon, Elisabeth Coudert, Jürg Bähler, Juancarlos Chan, Evelyn Camon, Daniel P. Renfro, Anne Estreicher, M. C. Blatter, Robert S. Nash, P Gaudet, Sven Heinicke, K. Van Auken, Stacia R. Engel, Alan Bridge, Ralf Stephan, Mary E. Dolan, Shane C. Burgess, Petra Fey, Shur-Jen Wang, Damien Lieberherr, Duncan Legge, P. Porras Millán, Andre Stutz, Yasmin Alam-Faruque, Gail Binkley, Bernd Roechert, S. Branconi-Quintaje, Ghislaine Argoud-Puy, S. Basu, Kim Rutherford, M. Moinat, Monte Westerfield, Arnaud Gos, Eleanor J Stanley, Valerie Wood, Ranjana Kishore, Diego Poggioli, S. Ferro-Rojas, Victoria Petri, Florence Jungo, Suzanna E. Lewis, Emmanuel Boutet, Warren A. Kibbe, M Feuermann, Claire O'Donovan, W. M. Chan, J. James, David P. Hill, Rachael P. Huntley, M. Gwinn Giglio, Paul Thomas, Jodi E. Hirschman, Paola Roncaglia, Gene Ontology Consortium, Blake, JA., Dolan, M., Drabkin, H., Hill, DP., Ni, L., Sitnikov, D., Burgess, S., Buza, T., Gresham, C., McCarthy, F., Pillai, L., Wang, H., Carbon, S., Lewis, SE., Mungall, CJ., Gaudet, P., Chisholm, RL., Fey, P., Kibbe, WA., Basu, S., Siegele, DA., McIntosh, BK., Renfro, DP., Zweifel, AE., Hu, JC., Brown, NH., Tweedie, S., Alam-Faruque, Y., Apweiler, R., Auchinchloss, A., Axelsen, K., Argoud-Puy, G., Bely, B., Blatter, M-., Bougueleret, L., Boutet, E., Branconi, S., Breuza, L., Bridge, A., Browne, P., Chan, WM., Coudert, E., Cusin, I., Dimmer, E., Duek-Roggli, P., Eberhardt, R., Estreicher, A., Famiglietti, L., Ferro-Rojas, S., Feuermann, M., Gardner, M., Gos, A., Gruaz-Gumowski, N., Hinz, U., Hulo, C., Huntley, R., James, J., Jimenez, S., Jungo, F., Keller, G., Laiho, K., Legge, D., Lemercier, P., Lieberherr, D., Magrane, M., Martin, MJ., Masson, P., Moinat, M., O'Donovan, C., Pedruzzi, I., Pichler, K., Poggioli, D., Porras Millán, P., Poux, S., Rivoire, C., Roechert, B., Sawford, T., Schneider, M., Sehra, H., Stanley, E., Stutz, A., Sundaram, S., Tognolli, M., Xenarios, I., Foulger, R., Lomax, J., Roncaglia, P., Camon, E., Khodiyar, VK., Lovering, RC., Talmud, PJ., Chibucos, M., Gwinn Giglio, M., Dolinski, K., Heinicke, S., Livstone, MS., Stephan, R., Harris, MA., Oliver, SG., Rutherford, K., Wood, V., Bahler, J., Lock, A., Kersey, PJ., McDowall, MD., Staines, DM., Dwinell, M., Shimoyama, M., Laulederkind, S., Hayman, T., Wang, S-., Petri, V., Lowry, T., D'Eustachio, P., Matthews, L., Amundsen, CD., Balakrishnan, R., Binkley, G., Cherry, JM., Christie, KR., Costanzo, MC., Dwight, SS., Engel, SR., Fisk, DG., Hirschman, JE., Hitz, BC., Hong, EL., Karra, K., Krieger, CJ., Miyasato, SR., Nash, RS., Park, J., Skrzypek, MS., Weng, S., Wong, ED., Berardini, TZ., Li, D., Huala, E., Slonim, D., Wick, H., Thomas, P., Chan, J., Kishore, R., Sternberg, P., Van Auken, K., Howe, D., and Westerfield, M.
- Subjects
Quality Control ,0303 health sciences ,media_common.quotation_subject ,Databases, Genetic ,Molecular Sequence Annotation/standards ,Vocabulary, Controlled ,Inference ,Molecular Sequence Annotation ,Articles ,Biology ,Ontology (information science) ,World Wide Web ,Open Biomedical Ontologies ,03 medical and health sciences ,Annotation ,0302 clinical medicine ,Resource (project management) ,Controlled vocabulary ,Genetics ,Social media ,Function (engineering) ,030217 neurology & neurosurgery ,030304 developmental biology ,media_common - Abstract
The Gene Ontology (GO) (http://www.geneontology.org) is a community bioinformatics resource that represents gene product function through the use of structured, controlled vocabularies. The number of GO annotations of gene products has increased due to curation efforts among GO Consortium (GOC) groups, including focused literature-based annotation and ortholog-based functional inference. The GO ontologies continue to expand and improve as a result of targeted ontology development, including the introduction of computable logical definitions and development of new tools for the streamlined addition of terms to the ontology. The GOC continues to support its user community through the use of e-mail lists, social media and web-based resources.
- Published
- 2011
12. WormBase: new content and better access
- Author
-
Todd W. Harris, Juancarlos Chan, Hans-Michael Müller, Igor Antoshechkin, Darin Blasiar, Tristan J. Fiedler, Raymond Lee, Tamberlyn Bieri, Philip Ozersky, Arun Rangarajan, Lincoln Stein, Lisa R. Girard, William Spooner, Gary Williams, Sheldon J. McKay, Erich M. Schwarz, Ranjana Kishore, Anthony Rogers, Richard Durbin, Nansheng Chen, Wen J. Chen, Gary Schindelman, Kimberly Van Auken, Carol Bastiani, Payan Canaran, Cecilia Nakamura, Michael Han, Mary Ann Tuli, John Spieth, Paul Davis, Xiaodong Wang, Daniel Wang, Andrei Petcherski, and Paul W. Sternberg
- Subjects
WormBook ,Genomics ,Genome browser ,Computational biology ,User-Computer Interface ,03 medical and health sciences ,Annotation ,0302 clinical medicine ,Databases, Genetic ,Genetics ,Animals ,Caenorhabditis elegans ,Genes, Helminth ,Oligonucleotide Array Sequence Analysis ,030304 developmental biology ,Genome, Helminth ,Internet ,0303 health sciences ,biology ,Articles ,biology.organism_classification ,Caenorhabditis ,Fosmid ,Phenotype ,RNA Interference ,WormBase ,Caltech Library Services ,030217 neurology & neurosurgery - Abstract
WormBase (http://wormbase.org), a model organism database for Caenorhabditis elegans and other related nematodes, continues to evolve and expand. Over the past year WormBase has added new data on C.elegans, including data on classical genetics, cell biology and functional genomics; expanded the annotation of closely related nematodes with a new genome browser for Caenorhabditis remanei; and deployed new hardware for stronger performance. Several existing datasets including phenotype descriptions and RNAi experiments have seen a large increase in new content. New datasets such as the C.remanei draft assembly and annotations, the Vancouver Fosmid library and TEC-RED 5' end sites are now available as well. Access to and searching WormBase has become more dependable and flexible via multiple mirror sites and indexing through Google.
- Published
- 2006
13. WormBase: a comprehensive data resource for Caenorhabditis biology and genomics
- Author
-
Paul H. Davis, Philip Ozersky, Kimberly Van Auken, Igor Antoshechkin, Erich M. Schwarz, Eimear E. Kenny, Daniel Lawson, Keith Bradnam, Lincoln Stein, Paul W. Sternberg, Wen J. Chen, Anthony Rogers, Qinghua Wang, Juancarlos Chan, Carol Bastiani, Nansheng Chen, Andrei Petcherski, Shraddha Pai, Darin Blasiar, Chao-Kung Chen, Ranjana Kishore, John Spieth, Cecilia Nakamura, Hans-Michael Müller, Payan Canaran, Fiona Cunningham, Tamberlyn Bieri, Raymond Lee, Aniko Sabo, Richard Durbin, and Todd W. Harris
- Subjects
WormBook ,Protein Conformation ,Genomics ,Computational biology ,Biology ,Bioinformatics ,Genome ,User-Computer Interface ,03 medical and health sciences ,0302 clinical medicine ,Two-Hybrid System Techniques ,Databases, Genetic ,Genetics ,Animals ,Caenorhabditis elegans ,Caenorhabditis elegans Proteins ,030304 developmental biology ,Database model ,0303 health sciences ,Gene Expression Profiling ,Articles ,biology.organism_classification ,Systems Integration ,Caenorhabditis ,WormBase ,User interface ,Software ,Caltech Library Services ,030217 neurology & neurosurgery - Abstract
WormBase (http://www.wormbase.org), the model organism database for information about Caenorhabditis elegans and related nematodes, continues to expand in breadth and depth. Over the past year, WormBase has added multiple large-scale datasets including SAGE, interactome, 3D protein structure datasets and NCBI KOGs. To accommodate this growth, the International WormBase Consortium has improved the user interface by adding new features to aid in navigation, visualization of large-scale datasets, advanced searching and data mining. Internally, we have restructured the database models to rationalize the representation of genes and to prepare the system to accept the genome sequences of three additional Caenorhabditis species over the coming year.
- Published
- 2004
14. WormBase 2014: new views of curated biology
- Author
-
Todd W. Harris, Tamberlyn Bieri, Kimberly Van Auken, J. D. Wong, Tim Schedl, Christian A. Grove, Mary Ann Tuli, Yuling Li, Raymond Lee, Gary Williams, Gary Schindelman, Juancarlos Chan, Kevin L. Howe, John Spieth, Lincoln Stein, Paul J. Kersey, Joachim Baran, Philip Ozersky, Jonathan Hodgkin, Daniel Wang, Daniela Raciti, Hans-Michael Müller, Ranjana Kishore, James Done, Abigail Cabunoc, Paul W. Sternberg, Paul H. Davis, Matthew Berriman, Xiaodong Wang, Wen J. Chen, Karen Yook, Michael Paulini, and Cecilia Nakamura
- Subjects
Genetics ,Genome, Helminth ,Internet ,Nematoda ,biology ,WormBook ,business.industry ,Molecular Sequence Annotation ,V. Human genome, model organisms, comparative genomics ,Coping behavior ,biology.organism_classification ,Manual curation ,World Wide Web ,Databases, Genetic ,Animals ,The Internet ,WormBase ,Caenorhabditis elegans ,business ,Web site - Abstract
WormBase (http://www.wormbase.org/) is a highly curated resource dedicated to supporting research using the model organism Caenorhabditis elegans. With an electronic history predating the World Wide Web, WormBase contains information ranging from the sequence and phenotype of individual alleles to genome-wide studies generated using next-generation sequencing technologies. In recent years, we have expanded the contents to include data on additional nematodes of agricultural and medical significance, bringing the knowledge of C. elegans to bear on these systems and providing support for underserved research communities. Manual curation of the primary literature remains a central focus of the WormBase project, providing users with reliable, up-to-date and highly cross-linked information. In this update, we describe efforts to organize the original atomized and highly contextualized curated data into integrated syntheses of discrete biological topics. Next, we discuss our experiences coping with the vast increase in available genome sequences made possible through next-generation sequencing platforms. Finally, we describe some of the features and tools of the new WormBase Web site that help users better find and explore data of interest.
- Published
- 2014
15. Text mining in the biocuration workflow: applications for literature curation at WormBase, dictyBase and TAIR
- Author
-
Petra Fey, Donghui Li, Robert J. Dodson, Kimberly Van Auken, S. Basu, Tanya Z. Berardini, Eva Huala, Hans-Michael Müller, Laurel Cooper, Rex L. Chisholm, Juancarlos Chan, Yuling Li, and Paul W. Sternberg
- Subjects
Arabidopsis ,Computational biology ,Ontology (information science) ,Biology ,General Biochemistry, Genetics and Molecular Biology ,BioCreative Virtual Issue ,Workflow ,World Wide Web ,Access to Information ,03 medical and health sciences ,Databases, Genetic ,Animals ,Data Mining ,Dictyostelium ,Caenorhabditis elegans ,030304 developmental biology ,0303 health sciences ,030306 microbiology ,Gene ontology ,The Arabidopsis Information Resource ,Molecular Sequence Annotation ,DictyBase ,Multiple data ,WormBase ,Periodicals as Topic ,General Agricultural and Biological Sciences ,Information Systems - Abstract
WormBase, dictyBase and The Arabidopsis Information Resource (TAIR) are model organism databases containing information about Caenorhabditis elegans and other nematodes, the social amoeba Dictyostelium discoideum and related Dictyostelids and the flowering plant Arabidopsis thaliana, respectively. Each database curates multiple data types from the primary research literature. In this article, we describe the curation workflow at WormBase, with particular emphasis on our use of text-mining tools (BioCreative 2012, Workshop Track II). We then describe the application of a specific component of that workflow, Textpresso for Cellular Component Curation (CCC), to Gene Ontology (GO) curation at dictyBase and TAIR (BioCreative 2012, Workshop Track III). We find that, with organism-specific modifications, Textpresso can be used by dictyBase and TAIR to annotate gene productions to GO's Cellular Component (CC) ontology.
- Published
- 2012
16. WormBase 2012: more genomes, more data, new website
- Author
-
Ranjana Kishore, Matthew Berriman, Snehalata Kadam, Xiaoqi Shi, Daniel Wang, Uma Ganesan, Paul J. Kersey, Bill Nash, Juancarlos Chan, Michael Paulini, Kimberly Van Auken, Cecilia Nakamura, Lincoln Stein, Paul W. Sternberg, Christian A. Grove, Arun Rangarajan, Todd W. Harris, John Spieth, Gary Williams, Erich M. Schwarz, Xiaodong Wang, Daniela Raciti, Richard Durbin, Mary Ann Tuli, Hans-Michael Müller, Philip Ozersky, Adrian Duong, Raymond Lee, Tamberlyn Bieri, Paul H. Davis, Kevin L. Howe, Abigail Cabunoc, Jonathan Hodgkin, Yuling Li, Ruihua Fang, Gary Schindelman, Wen J. Chen, Karen Yook, and Norie De La Cruz
- Subjects
Nematoda ,Scientific literature ,Biology ,World Wide Web ,03 medical and health sciences ,0302 clinical medicine ,Resource (project management) ,Databases, Genetic ,Genetics ,Computer Graphics ,Animals ,Caenorhabditis elegans ,030304 developmental biology ,0303 health sciences ,Genome, Helminth ,Internet ,Application programming interface ,business.industry ,Gene Expression Profiling ,Usability ,Molecular Sequence Annotation ,Genomics ,Articles ,Identification (information) ,Phenotype ,Data extraction ,Caenorhabditis ,The Internet ,WormBase ,business ,030217 neurology & neurosurgery - Abstract
Since its release in 2000, WormBase (http://www .wormbase.org) has grown from a small resource focusing on a single species and serving a dedicated research community, to one now spanning 15 species essential to the broader biomedical and agricultural research fields. To enhance the rate of curation, we have automated the identification of key data in the scientific literature and use similar methodology for data extraction. To ease access to the data, we are collaborating with journals to link entities in research publications to their report pages at WormBase. To facilitate discovery, we have added new views of the data, integrated large-scale datasets and expanded descriptions of models for human disease. Finally, we have introduced a dramatic overhaul of the WormBase website for public beta testing. Designed to balance complexity and usability, the new site is species-agnostic, highly customizable, and interactive. Casual users and developers alike will be able to leverage the public RESTful application programming interface (API) to generate custom data mining solutions and extensions to the site. We report on the growth of our database and on our work in keeping pace with the growing demand for data, efforts to anticipate the requirements of users and new collaborations with the larger science community. © The Author(s) 2011.
- Published
- 2011
17. Publishing Interactive Articles: Integrating Journals And Biological Databases
- Author
-
Lolly Otis, William M. Gelbart, Tim Schedl, Hans-Michael Müller, Steven J Marygold, Marek S. Skrzypek, Paul W. Sternberg, Arun Rangarajan, Tracey DePellegrin-Connelly, Stephen Haenel, Mike Cherry, Karen Yook, Juancarlos Chan, and Sharon Faelten
- Subjects
Information retrieval ,Bioinformatics ,Computer science ,Data Standards ,Biological database ,Genetics & Genomics ,Object (computer science) ,computer.software_genre ,Data type ,Pipeline (software) ,Scripting language ,Web page ,General Materials Science ,WormBase ,FlyBase : A Database of Drosophila Genes & Genomes ,computer - Abstract
In collaboration with the journal GENETICS, we've developed and launched a pipeline by which interactive full-text HTML/PDF journal articles are published with named entities linked to corresponding resource pages in "WormBase":http://www.wormbase.org/ (WB). Our interactive articles allow a reader to click on over ten different data type objects (gene, protein, transgene, etc.) and be directed to the relevant webpage. This seamless connection from the article to summaries of data types promotes a deeper level of understanding for the naïve reader, and incisive evaluation for the sophisticated reader. Further, this collaboration allows us to identify and collect information before the publication of the article. The pipeline uses automated recognition scripts to identify entities that already exist in the database and a self-reporting form we created at WB that is sent to the author by GENETICS for submitting entities that do not already exist in our database. We include a manual quality control step to make sure ambiguous links are corrected, and that all new entities have been reported and linked properly. The automated entity recognition scripts allows us to potentially link any object found in a database as well as to expand this pipeline to other databases. We have already adapted this pipeline for linking Saccharomyces cerevisiae GENETICS articles to the "Saccharomyces Genome Database":http://www.yeastgenome.org/ (SGD) and are currently expanding this pipeline for linking genes in Drosophila articles to "FlyBase":http://flybase.org/. By integrating journals and databases, we are integrating the major modes of communication in the biological sciences, which will undoubtedly increase the pace of discovery.
- Published
- 2010
18. Publishing Interactive Articles: Integrating Journals And Biological Databases
- Author
-
Karen Yook, Arun Rangarajan, Hans-Michael Muller, Paul Sternberg, Tracey DePellegrin-Connelly, Tim Schedl, Mike Cherry, William Gelbart, Juancarlos Chan, Stephen Haenel, Lolly Otis, Sharon Faelten, Marek Skrzypek, and Steven Marygold
- Subjects
General Materials Science - Published
- 2010
19. WormBase: a comprehensive resource for nematode research
- Author
-
Erich M. Schwarz, Mary Ann Tuli, Michael Han, Paul H. Davis, Juancarlos Chan, Xiaodong Wang, Ranjana Kishore, Norie De La Cruz, Igor Antoshechkin, Raymond Lee, Kimberly Van Auken, Arun Rangarajan, Darin Blasiar, Richard Durbin, Tamberlyn Bieri, Cecilia Nakamura, Philip Ozersky, Hans-Michael Müller, Margaret Duesbury, Ruihua Fang, Todd W. Harris, John Spieth, Paul W. Sternberg, Gary Schindelman, Lincoln Stein, Anthony Rogers, Gary Williams, Wen J. Chen, Karen Yook, Jolene Fernandes, Daniel Wang, and Andrei Petcherski
- Subjects
WormBook ,Information Storage and Retrieval ,Computational biology ,Information repository ,Genome ,03 medical and health sciences ,0302 clinical medicine ,Resource (project management) ,Databases, Genetic ,Genetics ,Animals ,Caenorhabditis elegans ,Databases, Protein ,Alleles ,030304 developmental biology ,0303 health sciences ,Internet ,biology ,Computational Biology ,Articles ,biology.organism_classification ,Protein Structure, Tertiary ,Caenorhabditis ,Phenotype ,WormBase ,Databases, Nucleic Acid ,030217 neurology & neurosurgery ,Software ,Transcription Factors - Abstract
WormBase (http://www.wormbase.org) is a central data repository for nematode biology. Initially created as a service to the Caenorhabditis elegans research field, WormBase has evolved into a powerful research tool in its own right. In the past 2 years, we expanded WormBase to include the complete genomic sequence, gene predictions and orthology assignments from a range of related nematodes. This comparative data enrich the C. elegans data with improved gene predictions and a better understanding of gene function. In turn, they bring the wealth of experimental knowledge of C. elegans to other systems of medical and agricultural importance. Here, we describe new species and data types now available at WormBase. In addition, we detail enhancements to our curatorial pipeline and website infrastructure to accommodate new genomes and an extensive user base.
- Published
- 2010
20. WormBase: better software, richer content
- Author
-
Paul Davis, Juancarlos Chan, William Spooner, Erich M. Schwarz, Todd W. Harris, Tamberlyn Bieri, Raymond Lee, Nansheng Chen, Lincoln Stein, Carol Bastiani, Daniel Wang, Phil Ozersky, John Spieth, Hans-Michael Müller, Daniel Lawson, Tristan J. Fiedler, Richard Durbin, Andrei Petcherski, Mary Ann Tuli, Kimberly Van Auken, Paul W. Sternberg, Ranjana Kishore, Payan Canaran, Igor Antoshechkin, Lisa-Christine Girard, Anthony Rogers, Eimear E. Kenny, Wen J. Chen, Cecilia Nakamura, and Darin Blasiar
- Subjects
DNA, Complementary ,Genomics ,Genome browser ,Computational biology ,Polymorphism, Single Nucleotide ,Article ,User-Computer Interface ,03 medical and health sciences ,Software portability ,0302 clinical medicine ,Databases, Genetic ,Genetics ,Animals ,Caenorhabditis elegans ,Caenorhabditis elegans Proteins ,Gene ,030304 developmental biology ,Expressed Sequence Tags ,Genome, Helminth ,Internet ,0303 health sciences ,Expressed sequence tag ,biology ,biology.organism_classification ,Data structure ,ComputingMethodologies_PATTERNRECOGNITION ,RNA Interference ,WormBase ,Software ,030217 neurology & neurosurgery ,Caltech Library Services - Abstract
WormBase (http://wormbase.org), the public database for genomics and biology of Caenorhabditis elegans, has been restructured for stronger performance and expanded for richer biological content. Performance was improved by accelerating the loading of central data pages such as the omnibus Gene page, by rationalizing internal data structures and software for greater portability, and by making the Genome Browser highly customizable in how it views and exports genomic subsequences. Arbitrarily complex, user-specified queries are now possible through Textpresso (for all available literature) and through WormMart (for most genomic data). Biological content was enriched by reconciling all available cDNA and expressed sequence tag data with gene predictions, clarifying single nucleotide polymorphism and RNAi sites, and summarizing known functions for most genes studied in this organism.
- Published
- 2006
21. WormBase: a multi-species resource for nematode biology and genomics
- Author
-
Raymond Lee, Lincoln Stein, Chao-Kung Chen, Aniko Sabo, John Spieth, Todd W. Harris, Carol Bastiani, Wen J. Chen, Marcela K. Tello-Ruiz, Hans-Michael Müller, Philip Ozersky, Richard Durbin, Eimear E. Kenny, Igor Antoshechkin, Qinghua Wang, Tamberlyn Bieri, Juancarlos Chan, Keith Bradnam, Andrei Petcherski, Darin Blasiar, Paul H. Davis, Anthony Rogers, Fiona Cunningham, Ranjana Kishore, Paul W. Sternberg, Cecilia Nakamura, Daniel Lawson, Nansheng Chen, Kimberly Van Auken, and Erich M. Schwarz
- Subjects
Comparative genomics ,Caenorhabditis briggsae ,Genetics ,Internet ,biology ,WormBook ,Computational Biology ,Information Storage and Retrieval ,Context (language use) ,Genomics ,Computational biology ,Articles ,biology.organism_classification ,User-Computer Interface ,Databases, Genetic ,Caenorhabditis ,Animals ,WormBase ,Caenorhabditis elegans ,Caltech Library Services ,Synteny - Abstract
WormBase (http://www.wormbase.org/) is the central data repository for information about Caenorhabditis elegans and related nematodes. As a model organism database, WormBase extends beyond the genomic sequence, integrating experimental results with extensively annotated views of the genome. The WormBase Consortium continues to expand the biological scope and utility of WormBase with the inclusion of large-scale genomic analyses, through active data and literature curation, through new analysis and visualization tools, and through refinement of the user interface. Over the past year, the nearly complete genomic sequence and comparative analyses of the closely related species Caenorhabditis briggsae have been integrated into WormBase, including gene predictions, ortholog assignments and a new synteny viewer to display the relationships between the two species. Extensive site-wide refinement of the user interface now provides quick access to the most frequently accessed resources and a consistent browsing experience across the site. Unified single-page views now provide complete summaries of commonly accessed entries like genes. These advances continue to increase the utility of WormBase for C.elegans researchers, as well as for those researchers exploring problems in functional and comparative genomics in the context of a powerful genetic system.
- Published
- 2004
22. WormBase: a cross-species database for comparative genomics
- Author
-
Andrei Petcherski, Allen Day, Lincoln Stein, Daniel Lawson, Juancarlos Chan, Keith Bradnam, Fiona Cunningham, Paul W. Sternberg, Gudmundur A. Thorisson, Darin Blasier, Richard Durbin, Todd W. Harris, Chao-Kung Chen, Ranjana Kishore, John Spieth, Raymond Lee, Tamberlyn Bieri, Wen Chen, Hans-Michael Müller, Anthony Rogers, Eimear E. Kenny, and Erich M. Schwarz
- Subjects
Quality Control ,WormBook ,Gene Expression ,Information Storage and Retrieval ,Context (language use) ,Genomics ,Genome browser ,Information repository ,Biology ,Polymorphism, Single Nucleotide ,World Wide Web ,Documentation ,Sequence Homology, Nucleic Acid ,Genetics ,Animals ,Caenorhabditis elegans ,Comparative genomics ,Expressed Sequence Tags ,Neurons ,Data Collection ,Articles ,DNA, Helminth ,Caenorhabditis ,RNA Interference ,WormBase ,RNA, Helminth ,Databases, Nucleic Acid ,Caltech Library Services - Abstract
WormBase (http://www.wormbase.org/) is a web-accessible central data repository for information about Caenorhabditis elegans and related nematodes. The past two years have seen a significant expansion in the biological scope of WormBase, including the integration of large-scale, genome-wide data sets, the inclusion of genome sequence and gene predictions from related species and active literature curation. This expansion of data has also driven the development and refinement of user interfaces and operability, including a new Genome Browser, new searches and facilities for data access and the inclusion of extensive documentation. These advances have expanded WormBase beyond the obvious target audience of C. elegans researchers, to include researchers wishing to explore problems in functional and comparative genomics within the context of a powerful genetic system.
- Published
- 2003
23. Textpresso - an Information Retrieval and Extraction System for Biological Literature
- Author
-
Hans-Michael Mueller and Arun Rangarajan and Tracy K. Teal and Kimberly van Auken and Juancarlos Chan and Paul W. Sternberg, Mueller, Hans-Michael, Rangarajan, Arun, Teal, Tracy K., van Auken, Kimberly, Chan, Juancarlos, Sternberg, Paul W., Hans-Michael Mueller and Arun Rangarajan and Tracy K. Teal and Kimberly van Auken and Juancarlos Chan and Paul W. Sternberg, Mueller, Hans-Michael, Rangarajan, Arun, Teal, Tracy K., van Auken, Kimberly, Chan, Juancarlos, and Sternberg, Paul W.
- Abstract
We developed an information retrieval and extraction system that processes the full text of biological papers. The system, called Textpresso, separates text into sentences, labels words and phrases according to an ontology (an organized lexicon), and allows queries to be performed on a database of labeled sentences. The current ontology comprises approximately one hundred categories of terms, such as "gene", "regulation", "human disease", "brain area" etc., and also contains main Gene Ontology (GO) categories. Extraction of particular biological facts, such as gene-ÂÂgene interactions, or the curation of GO cellular components, can be accelerated significantly by ontologies, with Textpresso automatically performing nearly as well as expert curators to identify sentences. Search engine for four literatures, C. elegans, Drosophila, Arabidopsis and Neuroscience have been established by us, and thirteen systems for other literatures have been developed by other groups around the world. Currently, our four systems contain 112,000 papers with 40 million sentences, all systems worldwide contain 190,000 papers with approximately 65 million sentences.
- Published
- 2008
- Full Text
- View/download PDF
24. An overview of the BioCreative 2012 Workshop Track III: interactive text mining task
- Author
-
Hans-Michael Müller, Wasila M. Dahdul, Kimberly Van Auken, Johnny Chi Yang Wu, Melissa A. Haendel, Robert J. Dodson, Donghui Li, Kevin G. Becker, Yuling Li, Chih-Hsuan Wei, Martin Krallinger, Jeyakumar Natarajan, Catalina O. Tudor, Mary L. Schaeffer, Suresh Subramani, Susan M. Bello, Petra Fey, W. John Wilbur, Marc Gillespie, Paula M. Mabee, Ceri E. Van Slyke, Hong Cui, S. Jimenez, Zhiyong Lu, Kalpana Raja, Phoebe M. Roberts, Ben Carterette, K. Bretonnel Cohen, Luana Licata, Laurel Cooper, Cathy H. Wu, Andrew Chatr-aryamontri, Juan Miguel Cejuela, Juancarlos Chan, James P. Balhoff, Bethany R. Harris, Cecilia N. Arighi, Lisa Matthews, Pratibha Dubey, Julie Park, and Harold J. Drabkin
- Subjects
Time Factors ,Computer science ,Documentation ,General Biochemistry, Genetics and Molecular Biology ,Education ,Task (project management) ,03 medical and health sciences ,Data Mining ,Humans ,Biocurator ,030304 developmental biology ,0303 health sciences ,Information retrieval ,Data curation ,Learnability ,End user ,business.industry ,030302 biochemistry & molecular biology ,Usability ,Data science ,Databases as Topic ,Original Article ,User interface ,General Agricultural and Biological Sciences ,business ,Software ,Information Systems - Abstract
In many databases, biocuration primarily involves literature curation, which usually involves retrieving relevant articles, extracting information that will translate into annotations and identifying new incoming literature. As the volume of biological literature increases, the use of text mining to assist in biocuration becomes increasingly relevant. A number of groups have developed tools for text mining from a computer science/linguistics perspective, and there are many initiatives to curate some aspect of biology from the literature. Some biocuration efforts already make use of a text mining tool, but there have not been many broad-based systematic efforts to study which aspects of a text mining tool contribute to its usefulness for a curation task. Here, we report on an effort to bring together text mining tool developers and database biocurators to test the utility and usability of tools. Six text mining systems presenting diverse biocuration tasks participated in a formal evaluation, and appropriate biocurators were recruited for testing. The performance results from this evaluation indicate that some of the systems were able to improve efficiency of curation by speeding up the curation task significantly (∼1.7- to 2.5-fold) over manual curation. In addition, some of the systems were able to improve annotation accuracy when compared with the performance on the manually curated set. In terms of inter-annotator agreement, the factors that contributed to significant differences for some of the systems included the expertise of the biocurator on the given curation task, the inherent difficulty of the curation and attention to annotation guidelines. After the task, annotators were asked to complete a survey to help identify strengths and weaknesses of the various systems. The analysis of this survey highlights how important task completion is to the biocurators’ overall experience of a system, regardless of the system’s high score on design, learnability and usability. In addition, strategies to refine the annotation guidelines and systems documentation, to adapt the tools to the needs and query types the end user might have and to evaluate performance in terms of efficiency, user interface, result export and traditional evaluation metrics have been analyzed during this task. This analysis will help to plan for a more intense study in BioCreative IV.
- Published
- 2013
25. Additional file 2 of Tissue enrichment analysis for C. elegans genomics
- Author
-
Angeles-Albores, David, Lee, Raymond N., Juancarlos Chan, and Sternberg, Paul
- Subjects
3. Good health - Abstract
Folder Structure for SI files 3 and 4. A file detailing the folder structure of the zipped folders 3 and 4. (PDF 138 kb)
26. Additional file 2 of Tissue enrichment analysis for C. elegans genomics
- Author
-
Angeles-Albores, David, Lee, Raymond N., Juancarlos Chan, and Sternberg, Paul
- Subjects
3. Good health - Abstract
Folder Structure for SI files 3 and 4. A file detailing the folder structure of the zipped folders 3 and 4. (PDF 138 kb)
27. Toward an interactive article: integrating journals and biological databases
- Author
-
Juancarlos Chan, Raymund Stefancsik, Lolly Otis, Sharon Faelten, Steven J Marygold, Ruth Isaacson, Hans-Michael Müller, Marek S. Skrzypek, Arun Rangarajan, Karen Yook, Tim Schedl, Stephen Haenel, J. Michael Cherry, Paul W. Sternberg, Tracey DePellegrin-Connelly, Marygold, Steven [0000-0003-2759-266X], and Apollo - University of Cambridge Repository
- Subjects
Quality Control ,0106 biological sciences ,Markup language ,Databases, Factual ,Computer science ,Biological database ,lcsh:Computer applications to medicine. Medical informatics ,01 natural sciences ,Biochemistry ,03 medical and health sciences ,Structural Biology ,Correspondence ,Databases, Genetic ,Animals ,Caenorhabditis elegans ,lcsh:QH301-705.5 ,Biology ,Molecular Biology ,030304 developmental biology ,Internet ,0303 health sciences ,Information retrieval ,Applied Mathematics ,Hyperlink ,Data science ,Pipeline (software) ,Computer Science Applications ,lcsh:Biology (General) ,lcsh:R858-859.7 ,Periodicals as Topic ,010606 plant biology & botany - Abstract
Background Journal articles and databases are two major modes of communication in the biological sciences, and thus integrating these critical resources is of urgent importance to increase the pace of discovery. Projects focused on bridging the gap between journals and databases have been on the rise over the last five years and have resulted in the development of automated tools that can recognize entities within a document and link those entities to a relevant database. Unfortunately, automated tools cannot resolve ambiguities that arise from one term being used to signify entities that are quite distinct from one another. Instead, resolving these ambiguities requires some manual oversight. Finding the right balance between the speed and portability of automation and the accuracy and flexibility of manual effort is a crucial goal to making text markup a successful venture. Results We have established a journal article mark-up pipeline that links GENETICS journal articles and the model organism database (MOD) WormBase. This pipeline uses a lexicon built with entities from the database as a first step. The entity markup pipeline results in links from over nine classes of objects including genes, proteins, alleles, phenotypes and anatomical terms. New entities and ambiguities are discovered and resolved by a database curator through a manual quality control (QC) step, along with help from authors via a web form that is provided to them by the journal. New entities discovered through this pipeline are immediately sent to an appropriate curator at the database. Ambiguous entities that do not automatically resolve to one link are resolved by hand ensuring an accurate link. This pipeline has been extended to other databases, namely Saccharomyces Genome Database (SGD) and FlyBase, and has been implemented in marking up a paper with links to multiple databases. Conclusions Our semi-automated pipeline hyperlinks articles published in GENETICS to model organism databases such as WormBase. Our pipeline results in interactive articles that are data rich with high accuracy. The use of a manual quality control step sets this pipeline apart from other hyperlinking tools and results in benefits to authors, journals, readers and databases.
- Full Text
- View/download PDF
28. Semi-automated curation of protein subcellular localization: a text mining-based approach to Gene Ontology (GO) cellular component curation
- Author
-
Juancarlos Chan, Paul W. Sternberg, Kimberly Van Auken, Joshua Jaffery, and Hans-Michael Müller
- Subjects
Computer science ,Information Storage and Retrieval ,Ontology (information science) ,lcsh:Computer applications to medicine. Medical informatics ,Manual curation ,Biochemistry ,03 medical and health sciences ,Text mining ,Structural Biology ,lcsh:QH301-705.5 ,Molecular Biology ,030304 developmental biology ,0303 health sciences ,Information retrieval ,business.industry ,Gene ontology ,Applied Mathematics ,Methodology Article ,030302 biochemistry & molecular biology ,Computational Biology ,Proteins ,Pipeline (software) ,Computer Science Applications ,Data extraction ,lcsh:Biology (General) ,lcsh:R858-859.7 ,Precision and recall ,business ,Algorithms - Abstract
Background Manual curation of experimental data from the biomedical literature is an expensive and time-consuming endeavor. Nevertheless, most biological knowledge bases still rely heavily on manual curation for data extraction and entry. Text mining software that can semi- or fully automate information retrieval from the literature would thus provide a significant boost to manual curation efforts. Results We employ the Textpresso category-based information retrieval and extraction system http://www.textpresso.org, developed by WormBase to explore how Textpresso might improve the efficiency with which we manually curate C. elegans proteins to the Gene Ontology's Cellular Component Ontology. Using a training set of sentences that describe results of localization experiments in the published literature, we generated three new curation task-specific categories (Cellular Components, Assay Terms, and Verbs) containing words and phrases associated with reports of experimentally determined subcellular localization. We compared the results of manual curation to that of Textpresso queries that searched the full text of articles for sentences containing terms from each of the three new categories plus the name of a previously uncurated C. elegans protein, and found that Textpresso searches identified curatable papers with recall and precision rates of 79.1% and 61.8%, respectively (F-score of 69.5%), when compared to manual curation. Within those documents, Textpresso identified relevant sentences with recall and precision rates of 30.3% and 80.1% (F-score of 44.0%). From returned sentences, curators were able to make 66.2% of all possible experimentally supported GO Cellular Component annotations with 97.3% precision (F-score of 78.8%). Measuring the relative efficiencies of Textpresso-based versus manual curation we find that Textpresso has the potential to increase curation efficiency by at least 8-fold, and perhaps as much as 15-fold, given differences in individual curatorial speed. Conclusion Textpresso is an effective tool for improving the efficiency of manual, experimentally based curation. Incorporating a Textpresso-based Cellular Component curation pipeline at WormBase has allowed us to transition from strictly manual curation of this data type to a more efficient pipeline of computer-assisted validation. Continued development of curation task-specific Textpresso categories will provide an invaluable resource for genomics databases that rely heavily on manual curation.
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.