18 results on '"Jeremy Goecks"'
Search Results
2. Inverting the model of genomics data sharing with the NHGRI Genomic Data Science Analysis, Visualization, and Informatics Lab-space
- Author
-
Michael C. Schatz, Anthony A. Philippakis, Enis Afgan, Eric Banks, Vincent J. Carey, Robert J. Carroll, Alessandro Culotti, Kyle Ellrott, Jeremy Goecks, Robert L. Grossman, Ira M. Hall, Kasper D. Hansen, Jonathan Lawson, Jeffrey T. Leek, Anne O’Donnell Luria, Stephen Mosher, Martin Morgan, Anton Nekrutenko, Brian D. O’Connor, Kevin Osborn, Benedict Paten, Candace Patterson, Frederick J. Tan, Casey Overby Taylor, Jennifer Vessio, Levi Waldron, Ting Wang, Kristin Wuichet, Alexander Baumann, Andrew Rula, Anton Kovalsy, Clare Bernard, Derek Caetano-Anollés, Geraldine A. Van der Auwera, Justin Canas, Kaan Yuksel, Kate Herman, M. Morgan Taylor, Marianie Simeon, Michael Baumann, Qi Wang, Robert Title, Ruchi Munshi, Sushma Chaluvadi, Valerie Reeves, William Disman, Salin Thomas, Allie Hajian, Elizabeth Kiernan, Namrata Gupta, Trish Vosburg, Ludwig Geistlinger, Marcel Ramos, Sehyun Oh, Dave Rogers, Frances McDade, Mim Hastie, Nitesh Turaga, Alexander Ostrovsky, Alexandru Mahmoud, Dannon Baker, Dave Clements, Katherine E.L. Cox, Keith Suderman, Nataliya Kucher, Sergey Golitsynskiy, Samantha Zarate, Sarah J. Wheelan, Kai Kammers, Ana Stevens, Carolyn Hutter, Christopher Wellington, Elena M. Ghanaim, Ken L. Wiley, Jr., Shurjo K. Sen, Valentina Di Francesco, Deni s Yuen, Brian Walsh, Luke Sargent, Vahid Jalili, John Chilton, Lori Shepherd, B.J. Stubbs, Ash O’Farrell, Benton A. Vizzier, Jr., Charles Overbeck, Charles Reid, David Charles Steinberg, Elizabeth A. Sheets, Julian Lucas, Lon Blauvelt, Louise Cabansay, Noah Warren, Brian Hannafious, Tim Harris, Radhika Reddy, Eric Torstenson, M. Katie Banasiewicz, Haley J. Abel, and Jason Walker
- Subjects
Genetics ,QH426-470 ,Internal medicine ,RC31-1245 - Abstract
Summary: The NHGRI Genomic Data Science Analysis, Visualization, and Informatics Lab-space (AnVIL; https://anvilproject.org) was developed to address a widespread community need for a unified computing environment for genomics data storage, management, and analysis. In this perspective, we present AnVIL, describe its ecosystem and interoperability with other platforms, and highlight how this platform and associated initiatives contribute to improved genomic data sharing efforts. The AnVIL is a federated cloud platform designed to manage and store genomics and related data, enable population-scale analysis, and facilitate collaboration through the sharing of data, code, and analysis results. By inverting the traditional model of data sharing, the AnVIL eliminates the need for data movement while also adding security measures for active threat detection and monitoring and provides scalable, shared computing resources for any researcher. We describe the core data management and analysis components of the AnVIL, which currently consists of Terra, Gen3, Galaxy, RStudio/Bioconductor, Dockstore, and Jupyter, and describe several flagship genomics datasets available within the AnVIL. We continue to extend and innovate the AnVIL ecosystem by implementing new capabilities, including mechanisms for interoperability and responsible data sharing, while streamlining access management. The AnVIL opens many new opportunities for analysis, collaboration, and data sharing that are needed to drive research and to make discoveries through the joint analysis of hundreds of thousands to millions of genomes along with associated clinical and molecular data types.
- Published
- 2022
- Full Text
- View/download PDF
3. The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2020 update
- Author
-
Vahid Jalili, Daniel Blankenberg, James Taylor, Jeremy Goecks, Qiang Gu, Anton Nekrutenko, Dave Clements, and Enis Afgan
- Subjects
Data Analysis ,Proteomics ,Biomedical Research ,ComputerSystemsOrganization_COMPUTERSYSTEMIMPLEMENTATION ,AcademicSubjects/SCI00010 ,Datasets as Topic ,Computational biology ,Biology ,Access management ,computer.software_genre ,GeneralLiterature_MISCELLANEOUS ,Server ,Genetics ,Metabolomics ,business.industry ,Published Erratum ,ComputingMilieux_PERSONALCOMPUTING ,Reproducibility of Results ,Data science ,Galaxy ,Software framework ,Web Server Issue ,Nucleic acid ,The Internet ,Federated identity ,Metagenomics ,User interface ,Single-Cell Analysis ,business ,Corrigendum ,computer ,Software - Abstract
Galaxy (https://galaxyproject.org) is a web-based computational workbench used by tens of thousands of scientists across the world to analyze large biomedical datasets. Since 2005, the Galaxy project has fostered a global community focused on achieving accessible, reproducible, and collaborative research. Together, this community develops the Galaxy software framework, integrates analysis tools and visualizations into the framework, runs public servers that make Galaxy available via a web browser, performs and publishes analyses using Galaxy, leads bioinformatics workshops that introduce and use Galaxy, and develops interactive training materials for Galaxy. Over the last two years, all aspects of the Galaxy project have grown: code contributions, tools integrated, users, and training materials. Key advances in Galaxy's user interface include enhancements for analyzing large dataset collections as well as interactive tools for exploratory data analysis. Extensions to Galaxy's framework include support for federated identity and access management and increased ability to distribute analysis jobs to remote resources. New community resources include large public servers in Europe and Australia, an increasing number of regional and local Galaxy communities, and substantial growth in the Galaxy Training Network.
- Published
- 2020
4. Galaxy-ML: An accessible, reproducible, and scalable machine learning toolkit for biomedicine
- Author
-
Björn Grüning, Allison L. Creason, Jeremy Goecks, Anup Kumar, Alireza Khanteymoori, Vahid Jalili, Simon Bray, and Qiang Gu
- Subjects
0301 basic medicine ,Science and Technology Workforce ,Computer science ,Astronomical Sciences ,Careers in Research ,computer.software_genre ,Trees ,Machine Learning ,0302 clinical medicine ,Medicine and Health Sciences ,Biology (General) ,GeneralLiterature_REFERENCE(e.g.,dictionaries,encyclopedias,glossaries) ,ComputingMilieux_MISCELLANEOUS ,Ecology ,Suite ,Eukaryota ,Plants ,Celestial Objects ,Professions ,Oncology ,Computational Theory and Mathematics ,Modeling and Simulation ,Physical Sciences ,Scalability ,ComputingMethodologies_DOCUMENTANDTEXTPROCESSING ,Workbench ,Supervised Machine Learning ,Research Article ,Computer and Information Sciences ,Science Policy ,QH301-705.5 ,Decision tree ,Machine learning ,03 medical and health sciences ,Cellular and Molecular Neuroscience ,Deep Learning ,Biomedical data ,Artificial Intelligence ,Genetics ,Molecular Biology ,Ecology, Evolution, Behavior and Systematics ,Biomedicine ,Web browser ,business.industry ,Deep learning ,Organisms ,Biology and Life Sciences ,Cancers and Neoplasms ,Computational Biology ,Reproducibility of Results ,Galaxies ,030104 developmental biology ,People and Places ,Scientists ,Population Groupings ,Artificial intelligence ,business ,computer ,Software ,030217 neurology & neurosurgery - Abstract
Supervised machine learning is an essential but difficult to use approach in biomedical data analysis. The Galaxy-ML toolkit (https://galaxyproject.org/community/machine-learning/) makes supervised machine learning more accessible to biomedical scientists by enabling them to perform end-to-end reproducible machine learning analyses at large scale using only a web browser. Galaxy-ML extends Galaxy (https://galaxyproject.org), a biomedical computational workbench used by tens of thousands of scientists across the world, with a suite of tools for all aspects of supervised machine learning.
- Published
- 2021
5. A long-read RNA-seq approach to identify novel transcripts of very large genes
- Author
-
Terence A. Partridge, Karuna Panchapakesan, Eric P. Hoffman, Jeremy Goecks, Carsten G. Bönnemann, Jyoti K. Jaiswal, Prech Uapinyoying, and Susan Knoblach
- Subjects
Gene isoform ,Method ,RNA-Seq ,Computational biology ,Biology ,03 medical and health sciences ,symbols.namesake ,Exon ,0302 clinical medicine ,Gene expression ,Genetics ,Humans ,RNA, Messenger ,Gene ,Genetics (clinical) ,030304 developmental biology ,Repetitive Sequences, Nucleic Acid ,Sanger sequencing ,0303 health sciences ,Sequence Analysis, RNA ,Gene Expression Profiling ,Structural gene ,Alternative splicing ,Computational Biology ,High-Throughput Nucleotide Sequencing ,Molecular Sequence Annotation ,Exons ,Alternative Splicing ,Organ Specificity ,symbols ,Transcriptome ,030217 neurology & neurosurgery - Abstract
RNA-seq is widely used for studying gene expression, but commonly used sequencing platforms produce short reads that only span up to two exon junctions per read. This makes it difficult to accurately determine the composition and phasing of exons within transcripts. Although long-read sequencing improves this issue, it is not amenable to precise quantitation, which limits its utility for differential expression studies. We used long-read isoform sequencing combined with a novel analysis approach to compare alternative splicing of large, repetitive structural genes in muscles. Analysis of muscle structural genes that produce medium (Nrap: 5 kb), large (Neb: 22 kb), and very large (Ttn: 106 kb) transcripts in cardiac muscle, and fast and slow skeletal muscles identified unannotated exons for each of these ubiquitous muscle genes. This also identified differential exon usage and phasing for these genes between the different muscle types. By mapping the in-phase transcript structures to known annotations, we also identified and quantified previously unannotated transcripts. Results were confirmed by endpoint PCR and Sanger sequencing, which revealed muscle-type-specific differential expression of these novel transcripts. The improved transcript identification and quantification shown by our approach removes previous impediments to studies aimed at quantitative differential expression of ultralong transcripts.
- Published
- 2020
6. A harmonized meta-knowledgebase of clinical interpretations of somatic genomic variants in cancer
- Author
-
Susan M. Mockus, Deborah I. Ritter, David Tamborero, Obi L. Griffith, Jeremy Goecks, Gordana Raca, Damian T. Rieke, Georgia Mayfield, Nuria Lopez-Bigas, Jianjiong Gao, Kilannin Krysiak, Melissa A. Haendel, Ryan P Duren, Olivier Elemento, Kyle Ellrott, Jordi Deu-Pons, Adam A. Margolin, Brian Walsh, Tero Aittokallio, Michael Baudis, Rodrigo Dienstmann, Subha Madhavan, Julie A. McMurry, Sara E. Patterson, Ethan Cerami, Ozman Ugur Sezerman, Robert R. Freimuth, Beth A. Pitel, Nikolaus Schultz, Lynn M. Schriml, Alex H. Wagner, Jeremy L. Warner, Mark Lawler, Jacques S. Beckmann, Dmitriy Sonkin, Catherine Del Vecchio Fitz, Xuan Shirley Li, Debyani Chakravarty, Malachi Griffith, Acibadem University Dspace, Variant Interpretation for Cancer Consortium, Institut Català de la Salut, [Wagner AH, Krysiak K] Washington University School of Medicine, St. Louis, MO, USA. [Walsh B, Mayfield G] Oregon Health and Science University, Portland, OR, USA. [Tamborero D] Pompeu Fabra University, Barcelona, Spain. Karolinska Institute, Solna, Sweden. [Sonkin D] National Cancer Institute, Rockville, MD, USA. [Dienstmann R] Vall d’Hebron Institute of Oncology (VHIO), Barcelona, Spain, and Vall d'Hebron Barcelona Hospital Campus
- Subjects
Matching (statistics) ,Knowledge Bases ,MEDLINE ,Computational biology ,Biology ,03 medical and health sciences ,0302 clinical medicine ,Information Science::Computing Methodologies::Algorithms::Artificial Intelligence::Knowledge Bases [INFORMATION SCIENCE] ,terapéutica::medicina de precisión [TÉCNICAS Y EQUIPOS ANALÍTICOS, DIAGNÓSTICOS Y TERAPÉUTICOS] ,SDG 3 - Good Health and Well-being ,Neoplasms ,fenómenos genéticos::variación genética [FENÓMENOS Y PROCESOS] ,Genetics research ,Databases, Genetic ,medicine ,Genetics ,Humans ,Relevance (information retrieval) ,Medicina personalitzada ,Precision Medicine ,Ciencias de la información::metodologías computacionales::algoritmos::inteligencia artificial::bases del conocimiento [CIENCIA DE LA INFORMACIÓN] ,030304 developmental biology ,Cancer ,Structure (mathematical logic) ,0303 health sciences ,Intel·ligència artificial - Aplicacions a la medicina ,Interpretation (philosophy) ,Diploidy ,Genetic Variation/genetics ,Genomics/methods ,Neoplasms/genetics ,Precision Medicine/methods ,Genetic Variation ,Genomics ,medicine.disease ,3. Good health ,Genetic Phenomena::Genetic Variation [PHENOMENA AND PROCESSES] ,Genòmica ,Precision oncology ,030220 oncology & carcinogenesis ,Meta-analysis ,Therapeutics::Precision Medicine [ANALYTICAL, DIAGNOSTIC AND THERAPEUTIC TECHNIQUES, AND EQUIPMENT] ,Analysis - Abstract
Precision oncology relies on accurate discovery and interpretation of genomic variants, enabling individualized diagnosis, prognosis and therapy selection. We found that six prominent somatic cancer variant knowledgebases were highly disparate in content, structure and supporting primary literature, impeding consensus when evaluating variants and their relevance in a clinical setting. We developed a framework for harmonizing variant interpretations to produce a meta-knowledgebase of 12,856 aggregate interpretations. We demonstrated large gains in overlap between resources across variants, diseases and drugs as a result of this harmonization. We subsequently demonstrated improved matching between a patient cohort and harmonized interpretations of potential clinical significance, observing an increase from an average of 33% per individual knowledgebase to 57% in aggregate. Our analyses illuminate the need for open, interoperable sharing of variant interpretation data. We also provide a freely available web interface (search.cancervariants.org) for exploring the harmonized interpretations from these six knowledgebases., This analysis presents a harmonized meta-knowledgebase to facilitate clinical interpretation of somatic genomic variants in cancer. This community-based project highlights the need for cooperative efforts to curate clinical interpretations of somatic variants for robust practice of precision oncology.
- Published
- 2020
- Full Text
- View/download PDF
7. The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2018 update
- Author
-
Vahid Jalili, Aysam Guerler, Daniel Blankenberg, Jeremy Goecks, Dave Clements, Anton Nekrutenko, Saskia Hiltemann, Jennifer Hillman-Jackson, John Chilton, James Taylor, Bérénice Batut, Björn Grüning, Martin Čech, Nate Coraor, Nicola Soranzo, Dannon Baker, Marius van den Beek, Helena Rasche, Enis Afgan, Dave Bouvier, and Pathology
- Subjects
0301 basic medicine ,Proteomics ,ComputerSystemsOrganization_COMPUTERSYSTEMIMPLEMENTATION ,International Cooperation ,Datasets as Topic ,Biology ,GeneralLiterature_MISCELLANEOUS ,03 medical and health sciences ,Scientific analysis ,User-Computer Interface ,0302 clinical medicine ,Server ,Genetics ,Humans ,Metabolomics ,Focus (computing) ,Internet ,business.industry ,Information Dissemination ,ComputingMilieux_PERSONALCOMPUTING ,Reproducibility of Results ,Genomics ,Data science ,Galaxy ,Molecular Imaging ,030104 developmental biology ,Web Server Issue ,Key (cryptography) ,The Internet ,User interface ,business ,030217 neurology & neurosurgery - Abstract
Galaxy (homepage: https://galaxyproject.org, main public server: https://usegalaxy.org) is a web-based scientific analysis platform used by tens of thousands of scientists across the world to analyze large biomedical datasets such as those found in genomics, proteomics, metabolomics and imaging. Started in 2005, Galaxy continues to focus on three key challenges of data-driven biomedical science: making analyses accessible to all researchers, ensuring analyses are completely reproducible, and making it simple to communicate analyses so that they can be reused and extended. During the last two years, the Galaxy team and the open-source community around Galaxy have made substantial improvements to Galaxy's core framework, user interface, tools, and training materials. Framework and user interface improvements now enable Galaxy to be used for analyzing tens of thousands of datasets, and >5500 tools are now available from the Galaxy ToolShed. The Galaxy community has led an effort to create numerous high-quality tutorials focused on common types of genomic analyses. The Galaxy developer and user communities continue to grow and be integral to Galaxy's development. The number of Galaxy public servers, developers contributing to the Galaxy framework and its tools, and users of the main Galaxy server have all increased substantially.
- Published
- 2018
- Full Text
- View/download PDF
8. PATH-14. GENETIC SUSCEPTIBILITY AND OUTCOMES OF PEDIATRIC, ADOLESCENT AND YOUNG ADULT IDH-MUTANT ASTROCYTOMAS
- Author
-
Sabine Mueller, Alberto Bronischer, Miriam Bornhorst, Lindsay Kilburn, Hayk Barseghyan, Eric Vilain, Tobey J. MacDonald, Joyce Turner, Brian R. Rood, Matthew Schniederjan, Jeremy Goecks, Denise Leung Leung, Enrico Opocher, Javad Nazarian, Cheng-Ying Ho, Cynthia Hawkins, Eugene Hwang, Eric Bouffet, Daniel R. Boue, Carl Koschmann, Brent A. Orr, Uri Tabori, Alexander O. Vortmeyer, Rajen Mody, Michal Zapotocky, Roger J. Packer, Surajit Bhattacharya, Asher Marks, Liana Nobre, and David A. Solomon
- Subjects
Genetics ,Cancer Research ,Oncology ,Mutant ,Path (graph theory) ,Genetic predisposition ,AcademicSubjects/MED00300 ,AcademicSubjects/MED00310 ,Neurology (clinical) ,Biology ,Young adult ,Pathology and Molecular Diagnosis - Abstract
INTRODUCTION Previously thought to be rare, recent case series have shown that IDH mutations in young patients are more common than previously described. In this study, we analyzed IDH-mutant tumors to determine clinical significance of these mutations in children, adolescents and young adults. METHODS Through this multi-institution study (10 institutions), we collected 64 IDH1/2-mutant infiltrating astrocytoma specimens from 58 patients aged 4–26 (M:F, 0.4:0.6). Specimens included 46 low-grade (LGG) and 18 high-grade (HGG) astrocytomas. Tumor sequencing data (n=45), germline sequencing data (n=37) and outcome data (n=40) was analyzed. RESULTS Similar to adults, most sequenced tumors had a co-mutation in the TP53 gene, while ATRX mutations were less common and primarily seen in HGGs. Approximately 60% (n=21) of patients with germline data available had a mutation in a cancer predisposition gene. Mismatch repair (MMR) mutations were most common (n=12; MSH6 n=9), followed by TP53mutations (n=7). All patients with MMR gene mutations had HGGs and poor progression free (PFS=10% at 2 years, mean TTP=9 months) and overall (OS CONCLUSION IDH-mutant tumors in pediatric patients are strongly associated with cancer predisposition and increased risk for progression/recurrence or malignant transformation. Routine screening for IDH1/2 mutations in children with grade 2–4 astrocytomas could greatly impact patient management.
- Published
- 2020
9. G-OnRamp: Generating genome browsers to facilitate undergraduate-driven collaborative genome annotation
- Author
-
Luke Sargent, Jeremy Goecks, Sarah C. R. Elgin, Nathan T. Mortimer, David Lopatto, Wilson Leung, and Yating Liu
- Subjects
0301 basic medicine ,Computer science ,Social Sciences ,Genome browser ,Genome ,User-Computer Interface ,Sequencing techniques ,0302 clinical medicine ,Sociology ,Databases, Genetic ,Invertebrate Genomics ,Biology (General) ,0303 health sciences ,Ecology ,4. Education ,05 social sciences ,Structural gene ,050301 education ,Computational gene ,RNA sequencing ,Genomics ,Genome project ,Drosophila melanogaster ,Molecular Sequence Annotation ,Computational Theory and Mathematics ,Modeling and Simulation ,Educational Status ,Workshops ,Algorithms ,QH301-705.5 ,Education ,World Wide Web ,03 medical and health sciences ,Cellular and Molecular Neuroscience ,Annotation ,Computer Graphics ,Genetics ,Animals ,Humans ,natural sciences ,Students ,Gene ,Molecular Biology ,Ecology, Evolution, Behavior and Systematics ,030304 developmental biology ,Comparative genomics ,Base Sequence ,Sequence Analysis, RNA ,Computational Biology ,Biology and Life Sciences ,Gene Annotation ,Comparative Genomics ,Genome Analysis ,Genome Annotation ,Research and analysis methods ,Molecular biology techniques ,ComputingMethodologies_PATTERNRECOGNITION ,030104 developmental biology ,Animal Genomics ,People and Places ,Population Groupings ,0503 education ,Undergraduates ,Software ,030217 neurology & neurosurgery - Abstract
Scientists are sequencing new genomes at an increasing rate with the goal of associating genome contents with phenotypic traits. After a new genome is sequenced and assembled, structural gene annotation is often the first step in analysis. Despite advances in computational gene prediction algorithms, most eukaryotic genomes still benefit from manual gene annotation. This requires access to good genome browsers to enable annotators to visualize and evaluate multiple lines of evidence (e.g., sequence similarity, RNA sequencing [RNA-Seq] results, gene predictions, repeats) and necessitates many volunteers to participate in the work. To address the technical barriers to creating genome browsers, the Genomics Education Partnership (GEP; https://gep.wustl.edu/) has partnered with the Galaxy Project (https://galaxyproject.org) to develop G-OnRamp (http://g-onramp.org), a web-based platform for creating UCSC Genome Browser Assembly Hubs and JBrowse genome browsers. G-OnRamp also converts a JBrowse instance into an Apollo instance for collaborative genome annotations in research and educational settings. The genome browsers produced can be transferred to the CyVerse Data Store for long-term access. G-OnRamp enables researchers to easily visualize their experimental results, educators to create Course-based Undergraduate Research Experiences (CUREs) centered on genome annotation, and students to participate in genomics research. In the process, students learn about genes/genomes and about how to utilize large datasets. Development of G-OnRamp was guided by extensive user feedback. Sixty-five researchers/educators from >40 institutions participated through in-person workshops, which produced >20 genome browsers now available for research and education. Genome browsers generated for four parasitoid wasp species have been used in a CURE engaging students at 15 colleges and universities. Our assessment results in the classroom demonstrate that the genome browsers produced by G-OnRamp are effective tools for engaging undergraduates in research and in enabling their contributions to the scientific literature in genomics. Expansion of such genomics research/education partnerships will be beneficial to researchers, faculty, and students alike., Author summary Major projects now underway aim to sequence most of the multicellular organisms on earth (e.g., the Earth Biogenome Project). But obtaining this data is only the beginning. To understand these organisms and how they relate to each other, we need to annotate their genomes (i.e., identify the genes and other features). While computers are essential for this process, most annotation tasks still require or benefit from human analyses. Genome browsers allow annotators to quickly visualize and evaluate multiple lines of evidence to create the best gene models. Hence, annotation of large number of eukaryotic species requires efficient generation of genome browsers and recruitment of many volunteers to participate. We have previously developed a web-based platform (G-OnRamp) to reduce the technical barriers for creating genome browsers. Using the G-OnRamp browsers, we engaged 15 faculty and their students in a Course-based Undergraduate Research Experience (CURE) focused on genome annotation of parasitoid wasp species. We find that G-OnRamp browsers work well in the classroom, and these efforts are beneficial for students and researchers. Students gain research experience, learn about genes and genomes, and learn how to work with large datasets. Researchers obtain high-quality datasets that could not be generated in any other way.
- Published
- 2020
- Full Text
- View/download PDF
10. Enabling precision medicine via standard communication of HTS provenance, analysis, and results
- Author
-
Jonas S. Almeida, Lydia Guo, Vahan Simonyan, Dan Taylor, Matthew Ezewudo, Hsinyi S. Tsang, Robel Kahsay, Anais Hayes, Jonathon Keeney, Elaine E. Thompson, Krista Smith, KanakaDurga Addepalli, Konstantinos Krampis, Gil Alterovitz, Anita Suresh, Raja Mazumder, Jeet Vora, Eric F. Donaldson, Amanda Bell, Carole Goble, Charles Hadley King, Yuching Lai, Michael R. Crusoe, Srikanth Gottipati, Stian Soiland-Reyes, Nuria Guimera, Hiroki Morizono, Paul Walsh, Marco Schito, Elaine Johanson, Jianchao Yao, Dennis A. Dean, Jeremy Goecks, Mark Walderhaug, Anjan Purkayastha, Toby Bloom, and Timothy C. Rodwell
- Subjects
0301 basic medicine ,Science and Technology Workforce ,Standardization ,Computer science ,Interoperability ,Careers in Research ,Workflow ,Database and Informatics Methods ,Software ,0302 clinical medicine ,Documentation ,Community Page ,BioCompute Objects ,Medicine and Health Sciences ,Precision Medicine ,Biology (General) ,0303 health sciences ,Genome ,General Neuroscience ,Communication ,High-Throughput Nucleotide Sequencing ,high-throughput sequencing ,Genomics ,Research Assessment ,Genomic Databases ,Reproducibility ,HL7 ,3. Good health ,Professions ,Open standard ,030220 oncology & carcinogenesis ,NGS ,HTS ,General Agricultural and Biological Sciences ,Bioinformatics ,Science Policy ,QH301-705.5 ,FHIR ,Biology ,CWL ,Research and Analysis Methods ,General Biochemistry, Genetics and Molecular Biology ,Domain (software engineering) ,World Wide Web ,03 medical and health sciences ,Genomic Medicine ,Genetics ,Animals ,Humans ,research objects ,030304 developmental biology ,Clinical Genetics ,General Immunology and Microbiology ,business.industry ,regulatory review ,Personalized Medicine ,Computational Biology ,Reproducibility of Results ,Biology and Life Sciences ,Usability ,Sequence Analysis, DNA ,Genome Analysis ,Precision medicine ,Data science ,GAG4H ,Clinical trial ,030104 developmental biology ,Biological Databases ,People and Places ,Scientists ,Population Groupings ,Software engineering ,business - Abstract
A personalized approach based on a patient's or pathogen’s unique genomic sequence is the foundation of precision medicine. Genomic findings must be robust and reproducible, and experimental data capture should adhere to findable, accessible, interoperable, and reusable (FAIR) guiding principles. Moreover, effective precision medicine requires standardized reporting that extends beyond wet-lab procedures to computational methods. The BioCompute framework (https://w3id.org/biocompute/1.3.0) enables standardized reporting of genomic sequence data provenance, including provenance domain, usability domain, execution domain, verification kit, and error domain. This framework facilitates communication and promotes interoperability. Bioinformatics computation instances that employ the BioCompute framework are easily relayed, repeated if needed, and compared by scientists, regulators, test developers, and clinicians. Easing the burden of performing the aforementioned tasks greatly extends the range of practical application. Large clinical trials, precision medicine, and regulatory submissions require a set of agreed upon standards that ensures efficient communication and documentation of genomic analyses. The BioCompute paradigm and the resulting BioCompute Objects (BCOs) offer that standard and are freely accessible as a GitHub organization (https://github.com/biocompute-objects) following the “Open-Stand.org principles for collaborative open standards development.” With high-throughput sequencing (HTS) studies communicated using a BCO, regulatory agencies (e.g., Food and Drug Administration [FDA]), diagnostic test developers, researchers, and clinicians can expand collaboration to drive innovation in precision medicine, potentially decreasing the time and cost associated with next-generation sequencing workflow exchange, reporting, and regulatory reviews., This Community Page article presents a communication standard for the provenance of high-throughput sequencing data; a BioCompute Object (BCO) can serve as a history of what was computed, be used as part of a validation process, or provide clarity and transparency of an experimental process to collaborators.
- Published
- 2018
- Full Text
- View/download PDF
11. Biology Needs Evolutionary Software Tools: Let's Build Them Right
- Author
-
Jeremy Goecks, Galaxy Team, James Taylor, Daniel Blankenberg, and Anton Nekrutenko
- Subjects
0301 basic medicine ,business.industry ,software ,Software tool ,Interpretation (philosophy) ,evolutionary biology ,Population biology ,Biological evolution ,Biology ,Data science ,Biological Evolution ,03 medical and health sciences ,030104 developmental biology ,Software ,Documentation ,computational biology ,Perspective ,Genetics ,Cancer biology ,business ,Shared responsibility ,Molecular Biology ,Ecology, Evolution, Behavior and Systematics - Abstract
Research in population genetics and evolutionary biology has always provided a computational backbone for life sciences as a whole. Today evolutionary and population biology reasoning are essential for interpretation of large complex datasets that are characteristic of all domains of today’s life sciences ranging from cancer biology to microbial ecology. This situation makes algorithms and software tools developed by our community more important than ever before. This means that we, developers of software tool for molecular evolutionary analyses, now have a shared responsibility to make these tools accessible using modern technological developments as well as provide adequate documentation and training.
- Published
- 2018
12. Impact of germline and somatic missense variations on drug binding sites
- Author
-
Jeremy Goecks, Raja Mazumder, Quan Wan, John Torcivia-Rodriguez, A Voskanian, A Nayak, Nagarajan Pattabiraman, Cheng Yan, P Lam, and Yang Pan
- Subjects
Models, Molecular ,0301 basic medicine ,Nonsynonymous substitution ,Genotype ,Pharmacogenomic Variants ,Protein Conformation ,Mutation, Missense ,Plasma protein binding ,Biology ,Germline ,Structure-Activity Relationship ,03 medical and health sciences ,Germline mutation ,Databases, Genetic ,Genetics ,Animals ,Data Mining ,Humans ,Precision Medicine ,Binding site ,Exome ,Germ-Line Mutation ,Pharmacology ,Binding Sites ,Computational Biology ,High-Throughput Nucleotide Sequencing ,Systems Integration ,Phenotype ,030104 developmental biology ,Pharmaceutical Preparations ,Drug development ,Pharmacogenetics ,Molecular Medicine ,Original Article ,Target protein ,Protein Binding - Abstract
Advancements in next-generation sequencing (NGS) technologies are generating a vast amount of data. This exacerbates the current challenge of translating NGS data into actionable clinical interpretations. We have comprehensively combined germline and somatic nonsynonymous single-nucleotide variations (nsSNVs) that affect drug binding sites in order to investigate their prevalence. The integrated data thus generated in conjunction with exome or whole-genome sequencing can be used to identify patients who may not respond to a specific drug because of alterations in drug binding efficacy due to nsSNVs in the target protein's gene. To identify the nsSNVs that may affect drug binding, protein-drug complex structures were retrieved from Protein Data Bank (PDB) followed by identification of amino acids in the protein-drug binding sites using an occluded surface method. Then, the germline and somatic mutations were mapped to these amino acids to identify which of these alter protein-drug binding sites. Using this method we identified 12 993 amino acid-drug binding sites across 253 unique proteins bound to 235 unique drugs. The integration of amino acid-drug binding sites data with both germline and somatic nsSNVs data sets revealed 3133 nsSNVs affecting amino acid-drug binding sites. In addition, a comprehensive drug target discovery was conducted based on protein structure similarity and conservation of amino acid-drug binding sites. Using this method, 81 paralogs were identified that could serve as alternative drug targets. In addition, non-human mammalian proteins bound to drugs were used to identify 142 homologs in humans that can potentially bind to drugs. In the current protein-drug pairs that contain somatic mutations within their binding site, we identified 85 proteins with significant differential gene expression changes associated with specific cancer types. Information on protein-drug binding predicted drug target proteins and prevalence of both somatic and germline nsSNVs that disrupt these binding sites can provide valuable knowledge for personalized medicine treatment. A web portal is available where nsSNVs from individual patient can be checked by scanning against DrugVar to determine whether any of the SNVs affect the binding of any drug in the database.
- Published
- 2016
- Full Text
- View/download PDF
13. Open pipelines for integrated tumor genome profiles reveal differences between pancreatic cancer tumors and cell lines
- Author
-
Bassel F. El-Rayes, Jeremy Goecks, Shishir K. Maithel, H. Jean Khoury, Michael R. Rossi, and James Taylor
- Subjects
Cancer Research ,Receptor, ErbB-2 ,pancreatic cancer ,Biology ,Genome ,Cell Line ,Proto-Oncogene Proteins p21(ras) ,Transcriptome ,Cell Line, Tumor ,Proto-Oncogene Proteins ,Pancreatic cancer ,medicine ,Humans ,Exome ,Radiology, Nuclear Medicine and imaging ,Cancer Biology ,Genetics ,Analysis pipelines ,Genome, Human ,Sequence Analysis, RNA ,genomic tumor profiles ,Gene Expression Profiling ,High-Throughput Nucleotide Sequencing ,Cancer ,bioinformatics ,medicine.disease ,3. Good health ,Pancreatic Neoplasms ,Gene expression profiling ,Oncology ,Mutation ,Genomic Profile ,ras Proteins ,Human genome ,galaxy ,Genes, Neoplasm - Abstract
We describe open, reproducible pipelines that create an integrated genomic profile of a cancer and use the profile to find mutations associated with disease and potentially useful drugs. These pipelines analyze high-throughput cancer exome and transcriptome sequence data together with public databases to find relevant mutations and drugs. The three pipelines that we have developed are: (1) an exome analysis pipeline, which uses whole or targeted tumor exome sequence data to produce a list of putative variants (no matched normal data are needed); (2) a transcriptome analysis pipeline that processes whole tumor transcriptome sequence (RNA-seq) data to compute gene expression and find potential gene fusions; and (3) an integrated variant analysis pipeline that uses the tumor variants from the exome pipeline and tumor gene expression from the transcriptome pipeline to identify deleterious and druggable mutations in all genes and in highly expressed genes. These pipelines are integrated into the popular Web platform Galaxy at http://usegalaxy.org/cancer to make them accessible and reproducible, thereby providing an approach for doing standardized, distributed analyses in clinical studies. We have used our pipeline to identify similarities and differences between pancreatic adenocarcinoma cancer cell lines and primary tumors.
- Published
- 2015
- Full Text
- View/download PDF
14. 4. Coordinating variant interpretation knowledgebases improves clinical interpretation of genomic variants in cancers
- Author
-
Dmitriy Sonkin, Xuan Shirley Li, David Tamborero, Georgia Mayfield, Jacques S. Beckmann, Brian Walsh, Obi L. Griffith, Adam Margolin, Rodrigo Dienstmann, Jeremy Goecks, Alex H. Wagner, Nuria Lopez-Bigas, and Malachi Griffith
- Subjects
Cancer Research ,Interpretation (philosophy) ,Genetics ,Computational biology ,Biology ,Molecular Biology - Published
- 2018
- Full Text
- View/download PDF
15. A quick guide for building a successful bioinformatics community
- Author
-
Manuel Corpas, B. F. Francis Ouellette, Jeremy Goecks, Michelle D. Brazas, Aleksandra Pawlik, Aidan Budd, Magali Michaut, Niklas Blomberg, Nicola Mulder, Jonathan C. Fuller, Institute of Infectious Disease and Molecular Medicine, and Faculty of Health Sciences
- Subjects
Knowledge management ,Bioinformatics ,Context (language use) ,Scientific field ,Education ,Social group ,Cellular and Molecular Neuroscience ,Genetics ,Humans ,Social media ,Sociology ,Molecular Biology ,lcsh:QH301-705.5 ,Ecology, Evolution, Behavior and Systematics ,Internet ,Ecology ,business.industry ,Communication ,Software development ,Intelligent decision support system ,Computational Biology ,Social communication ,Genome analysis ,Galaxies ,Computational Theory and Mathematics ,lcsh:Biology (General) ,Modeling and Simulation ,The Internet ,Workshops ,business ,Social Media ,Career development - Abstract
“Scientific community” refers to a group of people collaborating together on scientific-research-related activities who also share common goals, interests, and values. Such communities play a key role in many bioinformatics activities. Communities may be linked to a specific location or institute, or involve people working at many different institutions and locations. Education and training is typically an important component of these communities, providing a valuable context in which to develop skills and expertise, while also strengthening links and relationships within the community. Scientific communities facilitate: (i) the exchange and development of ideas and expertise; (ii) career development; (iii) coordinated funding activities; (iv) interactions and engagement with professionals from other fields; and (v) other activities beneficial to individual participants, communities, and the scientific field as a whole. It is thus beneficial at many different levels to understand the general features of successful, high-impact bioinformatics communities; how individual participants can contribute to the success of these communities; and the role of education and training within these communities. We present here a quick guide to building and maintaining a successful, high-impact bioinformatics community, along with an overview of the general benefits of participating in such communities. This article grew out of contributions made by organizers, presenters, panelists, and other participants of the ISMB/ECCB 2013 workshop “The ‘How To Guide’ for Establishing a Successful Bioinformatics Network” at the 21st Annual International Conference on Intelligent Systems for Molecular Biology (ISMB) and the 12th European Conference on Computational Biology (ECCB).
- Published
- 2015
16. Web-based visual analysis for high-throughput genomics
- Author
-
Jeremy Goecks, James Taylor, Tomithy Too, Carl Eberhard, and Anton Nekrutenko
- Subjects
medicine.medical_specialty ,Web server ,Statistics as Topic ,Genome browser ,Computational biology ,Biology ,computer.software_genre ,Circos ,Computer graphics ,03 medical and health sciences ,0302 clinical medicine ,Genetics ,medicine ,Computer Graphics ,Web application ,Phylogeny ,030304 developmental biology ,Visualization ,0303 health sciences ,Internet ,business.industry ,Information Dissemination ,Publications ,High-Throughput Nucleotide Sequencing ,Genomics ,Data science ,Data flow diagram ,Galaxy ,030220 oncology & carcinogenesis ,Visual analysis ,The Internet ,business ,Web modeling ,computer ,Software ,Biotechnology ,Phylogenetic tree - Abstract
Background Visualization plays an essential role in genomics research by making it possible to observe correlations and trends in large datasets as well as communicate findings to others. Visual analysis, which combines visualization with analysis tools to enable seamless use of both approaches for scientific investigation, offers a powerful method for performing complex genomic analyses. However, there are numerous challenges that arise when creating rich, interactive Web-based visualizations/visual analysis applications for high-throughput genomics. These challenges include managing data flow from Web server to Web browser, integrating analysis tools and visualizations, and sharing visualizations with colleagues. Results We have created a platform that simplifies the creation of Web-based visualization/visual analysis applications for high-throughput genomics. This platform provides components that make it simple to efficiently query very large datasets, draw common representations of genomic data, integrate with analysis tools, and share or publish fully interactive visualizations. Using this platform, we have created a Circos-style genome-wide viewer, a generic scatter plot for correlation analysis, an interactive phylogenetic tree, a scalable genome browser for next-generation sequencing data, and an application for systematically exploring tool parameter spaces to find good parameter values. All visualizations are interactive and fully customizable. The platform is integrated with the Galaxy (http://galaxyproject.org) genomics workbench, making it easy to integrate new visual applications into Galaxy. Conclusions Visualization and visual analysis play an important role in high-throughput genomics experiments, and approaches are needed to make it easier to create applications for these activities. Our framework provides a foundation for creating Web-based visualizations and integrating them into Galaxy. Finally, the visualizations we have created using the framework are useful tools for high-throughput genomics experiments.
- Published
- 2013
17. Integrative approach reveals composition of endoparasitoid wasp venoms
- Author
-
Jeremy Goecks, Nathan T. Mortimer, James A. Mobley, Gregory J. Bowersock, James Taylor, and Todd A. Schlenke
- Subjects
Male ,Proteome ,Wasp Venoms ,Animal Evolution ,Wasps ,lcsh:Medicine ,Venom ,Bioinformatics ,Antioxidants ,Sequence Analysis, Protein ,Genome Sequencing ,Heterotoma ,lcsh:Science ,Genome Evolution ,Genetics ,0303 health sciences ,Multidisciplinary ,Drosophila Melanogaster ,030302 biochemistry & molecular biology ,Genomics ,Animal Models ,3. Good health ,Functional Genomics ,Insect Proteins ,Female ,Drosophila melanogaster ,Sequence Analysis ,Glycolysis ,Research Article ,Signal peptide ,food.ingredient ,Sequence analysis ,Immunology ,Virulence ,Biology ,Host-Parasite Interactions ,Immunomodulation ,03 medical and health sciences ,food ,Model Organisms ,Species Specificity ,Animals ,Gene ,030304 developmental biology ,Immune Evasion ,Evolutionary Biology ,Sequence Homology, Amino Acid ,fungi ,lcsh:R ,Computational Biology ,Molecular Sequence Annotation ,Sequence Analysis, DNA ,Comparative Genomics ,biology.organism_classification ,Organismal Evolution ,Immunity, Innate ,lcsh:Q ,Genome Expression Analysis ,Transcriptome - Abstract
The fruit fly Drosophila melanogaster and its endoparasitoid wasps are a developing model system for interactions between host immune responses and parasite virulence mechanisms. In this system, wasps use diverse venom cocktails to suppress the conserved fly cellular encapsulation response. Although numerous genetic tools allow detailed characterization of fly immune genes, lack of wasp genomic information has hindered characterization of the parasite side of the interaction. Here, we use high-throughput nucleic acid and amino acid sequencing methods to describe the venoms of two related Drosophila endoparasitoids with distinct infection strategies, Leptopilina boulardi and L. heterotoma. Using RNA-seq, we assembled and quantified libraries of transcript sequences from female wasp abdomens. Next, we used mass spectrometry to sequence peptides derived from dissected venom gland lumens. We then mapped the peptide spectral data against the abdomen transcriptomes to identify a set of putative venom genes for each wasp species. Our approach captured the three venom genes previously characterized in L. boulardi by traditional cDNA cloning methods as well as numerous new venom genes that were subsequently validated by a combination of RT-PCR, blast comparisons, and secretion signal sequence search. Overall, 129 proteins were found to comprise L. boulardi venom and 176 proteins were found to comprise L. heterotoma venom. We found significant overlap in L. boulardi and L. heterotoma venom composition but also distinct differences that may underlie their unique infection strategies. Our joint transcriptomic-proteomic approach for endoparasitoid wasp venoms is generally applicable to identification of functional protein subsets from any non-genome sequenced organism.
- Published
- 2013
18. Dietary and Flight Energetic Adaptations in a Salivary Gland Transcriptome of an Insectivorous Bat
- Author
-
Robert J. Baker, Enrique P. Lessa, Jeremy Goecks, Carleton J. Phillips, Bernard Tandler, Cibele G. Sotero-Caio, Caleb D. Phillips, and Michael R. Gannon
- Subjects
Proteomics ,Evolutionary Genetics ,Proteome ,Animal Evolution ,Gene Dosage ,Evolutionary Selection ,Biochemistry ,Transcriptomes ,Transcriptome ,0302 clinical medicine ,Chiroptera ,Gene Duplication ,Gene duplication ,Genome Evolution ,Genetics ,Evolutionary Theory ,0303 health sciences ,Multidisciplinary ,Ecology ,biology ,Salivary gland ,Hydrolysis ,Genomics ,Myotis lucifugus ,Adaptation, Physiological ,medicine.anatomical_structure ,Medicine ,Research Article ,Evolutionary Processes ,Science ,Submandibular Gland ,Hyperlipidemias ,Evolution, Molecular ,03 medical and health sciences ,Genome Analysis Tools ,medicine ,Animals ,Biology ,Gene ,030304 developmental biology ,Evolutionary Biology ,Proteins ,Computational Biology ,Biological Transport ,Genomic Evolution ,Lipid metabolism ,Lipid Metabolism ,biology.organism_classification ,Dietary Fats ,Organismal Evolution ,Diet ,Secretory protein ,Evolutionary Ecology ,Flight, Animal ,Energy Metabolism ,030217 neurology & neurosurgery - Abstract
We hypothesized that evolution of salivary gland secretory proteome has been important in adaptation to insectivory, the most common dietary strategy among Chiroptera. A submandibular salivary gland (SMG) transcriptome was sequenced for the little brown bat, Myotis lucifugus. The likely secretory proteome of 23 genes included seven (RETNLB, PSAP, CLU, APOE, LCN2, C3, CEL) related to M. lucifugus insectivorous diet and metabolism. Six of the secretory proteins probably are endocrine, whereas one (CEL) most likely is exocrine. The encoded proteins are associated with lipid hydrolysis, regulation of lipid metabolism, lipid transport, and insulin resistance. They are capable of processing exogenous lipids for flight metabolism while foraging. Salivary carboxyl ester lipase (CEL) is thought to hydrolyze insect lipophorins, which probably are absorbed across the gastric mucosa during feeding. The other six proteins are predicted either to maintain these lipids at high blood concentrations or to facilitate transport and uptake by flight muscles. Expression of these seven genes and coordinated secretion from a single organ is novel to this insectivorous bat, and apparently has evolved through instances of gene duplication, gene recruitment, and nucleotide selection. Four of the recruited genes are single-copy in the Myotis genome, whereas three have undergone duplication(s) with two of these genes exhibiting evolutionary ‘bursts’ of duplication resulting in multiple paralogs. Evidence for episodic directional selection was found for six of seven genes, reinforcing the conclusion that the recruited genes have important roles in adaptation to insectivory and the metabolic demands of flight. Intragenic frequencies of mobile- element-like sequences differed from frequencies in the whole M. lucifugus genome. Differences among recruited genes imply separate evolutionary trajectories and that adaptation was not a single, coordinated event.
- Published
- 2014
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.