34 results on '"Winnenburg R"'
Search Results
2. GoGene: gene annotation in the fast lane
- Author
-
Plake, C., primary, Royer, L., additional, Winnenburg, R., additional, Hakenberg, J., additional, and Schroeder, M., additional
- Published
- 2009
- Full Text
- View/download PDF
3. Facts from text: can text mining help to scale-up high-quality manual curation of gene products with ontologies?
- Author
-
Winnenburg, R., primary, Wachter, T., additional, Plake, C., additional, Doms, A., additional, and Schroeder, M., additional
- Published
- 2008
- Full Text
- View/download PDF
4. PHI-base update: additions to the pathogen host interaction database
- Author
-
Winnenburg, R., primary, Urban, M., additional, Beacham, A., additional, Baldwin, T. K., additional, Holland, S., additional, Lindeberg, M., additional, Hansen, H., additional, Rawlings, C., additional, Hammond-Kosack, K. E., additional, and Kohler, J., additional
- Published
- 2007
- Full Text
- View/download PDF
5. PHI-base: a new database for pathogen host interactions
- Author
-
Winnenburg, R., primary
- Published
- 2006
- Full Text
- View/download PDF
6. The OXL format for the exchange of integrated datasets
- Author
-
Taubert Jan, Sieren Klaus Peter, Hindle Matthew, Hoekman Berend, Winnenburg Rainer, Philippi Stephan, Rawlings Chris, and Köhler Jacob
- Subjects
Biotechnology ,TP248.13-248.65 - Abstract
A prerequisite for systems biology is the integration and analysis of heterogeneous experimental data stored in hundreds of life-science databases and millions of scientific publications. Several standardised formats for the exchange of specific kinds of biological information exist. Such exchange languages facilitate the integration process; however they are not designed to transport integrated datasets. A format for exchanging integrated datasets needs to i) cover data from a broad range of application domains, ii) be flexible and extensible to combine many different complex data structures, iii) include metadata and semantic definitions, iv) include inferred information, v) identify the original data source for integrated entities and vi) transport large integrated datasets. Unfortunately, none of the exchange formats from the biological domain (e.g. BioPAX, MAGE-ML, PSI-MI, SBML) or the generic approaches (RDF, OWL) fulfil these requirements in a systematic way.
- Published
- 2007
- Full Text
- View/download PDF
7. The digital revolution in phenotyping
- Author
-
Oellrich A, Collier N, Groza T, Rebholz-Schuhmann D, Shah N, Bodenreider O, Boland MR, Georgiev I, Liu H, Livingston K, Luna A, Am, Mallon, Manda P, Pn, Robinson, Gabriella Rustici, Simon M, Wang L, Winnenburg R, and Dumontier M
8. Implementation and relevance of FAIR data principles in biopharmaceutical R&D.
- Author
-
Wise J, de Barron AG, Splendiani A, Balali-Mood B, Vasant D, Little E, Mellino G, Harrow I, Smith I, Taubert J, van Bochove K, Romacker M, Walgemoed P, Jimenez RC, Winnenburg R, Plasterer T, Gupta V, and Hedley V
- Subjects
- Biological Products, Biomedical Research, Data Management, Drug Industry
- Abstract
Biopharmaceutical industry R&D, and indeed other life sciences R&D such as biomedical, environmental, agricultural and food production, is becoming increasingly data-driven and can significantly improve its efficiency and effectiveness by implementing the FAIR (findable, accessible, interoperable, reusable) guiding principles for scientific data management and stewardship. By so doing, the plethora of new and powerful analytical tools such as artificial intelligence and machine learning will be able, automatically and at scale, to access the data from which they learn, and on which they thrive. FAIR is a fundamental enabler for digital transformation., (Copyright © 2019 The Authors. Published by Elsevier Ltd.. All rights reserved.)
- Published
- 2019
- Full Text
- View/download PDF
9. Interpretation of biological experiments changes with evolution of the Gene Ontology and its annotations.
- Author
-
Tomczak A, Mortensen JM, Winnenburg R, Liu C, Alessi DT, Swamy V, Vallania F, Lofgren S, Haynes W, Shah NH, Musen MA, and Khatri P
- Subjects
- Humans, Reproducibility of Results, Computational Biology, Databases, Genetic, Evolution, Molecular, Gene Ontology, Models, Genetic, Molecular Sequence Annotation
- Abstract
Gene Ontology (GO) enrichment analysis is ubiquitously used for interpreting high throughput molecular data and generating hypotheses about underlying biological phenomena of experiments. However, the two building blocks of this analysis - the ontology and the annotations - evolve rapidly. We used gene signatures derived from 104 disease analyses to systematically evaluate how enrichment analysis results were affected by evolution of the GO over a decade. We found low consistency between enrichment analyses results obtained with early and more recent GO versions. Furthermore, there continues to be a strong annotation bias in the GO annotations where 58% of the annotations are for 16% of the human genes. Our analysis suggests that GO evolution may have affected the interpretation and possibly reproducibility of experiments over time. Hence, researchers must exercise caution when interpreting GO enrichment analyses and should reexamine previous analyses with the most recent GO version.
- Published
- 2018
- Full Text
- View/download PDF
10. U-Index, a dataset and an impact metric for informatics tools and databases.
- Author
-
Callahan A, Winnenburg R, and Shah NH
- Abstract
Measuring the usage of informatics resources such as software tools and databases is essential to quantifying their impact, value and return on investment. We have developed a publicly available dataset of informatics resource publications and their citation network, along with an associated metric (u-Index) to measure informatics resources' impact over time. Our dataset differentiates the context in which citations occur to distinguish between 'awareness' and 'usage', and uses a citing universe of open access publications to derive citation counts for quantifying impact. Resources with a high ratio of usage citations to awareness citations are likely to be widely used by others and have a high u-Index score. We have pre-calculated the u-Index for nearly 100,000 informatics resources. We demonstrate how the u-Index can be used to track informatics resource impact over time. The method of calculating the u-Index metric, the pre-computed u-Index values, and the dataset we compiled to calculate the u-Index are publicly available.
- Published
- 2018
- Full Text
- View/download PDF
11. Toward multimodal signal detection of adverse drug reactions.
- Author
-
Harpaz R, DuMouchel W, Schuemie M, Bodenreider O, Friedman C, Horvitz E, Ripple A, Sorbello A, White RW, Winnenburg R, and Shah NH
- Subjects
- Databases, Factual, Humans, United States, United States Food and Drug Administration, Adverse Drug Reaction Reporting Systems
- Abstract
Objective: Improving mechanisms to detect adverse drug reactions (ADRs) is key to strengthening post-marketing drug safety surveillance. Signal detection is presently unimodal, relying on a single information source. Multimodal signal detection is based on jointly analyzing multiple information sources. Building on, and expanding the work done in prior studies, the aim of the article is to further research on multimodal signal detection, explore its potential benefits, and propose methods for its construction and evaluation., Material and Methods: Four data sources are investigated; FDA's adverse event reporting system, insurance claims, the MEDLINE citation database, and the logs of major Web search engines. Published methods are used to generate and combine signals from each data source. Two distinct reference benchmarks corresponding to well-established and recently labeled ADRs respectively are used to evaluate the performance of multimodal signal detection in terms of area under the ROC curve (AUC) and lead-time-to-detection, with the latter relative to labeling revision dates., Results: Limited to our reference benchmarks, multimodal signal detection provides AUC improvements ranging from 0.04 to 0.09 based on a widely used evaluation benchmark, and a comparative added lead-time of 7-22 months relative to labeling revision dates from a time-indexed benchmark., Conclusions: The results support the notion that utilizing and jointly analyzing multiple data sources may lead to improved signal detection. Given certain data and benchmark limitations, the early stage of development, and the complexity of ADRs, it is currently not possible to make definitive statements about the ultimate utility of the concept. Continued development of multimodal signal detection requires a deeper understanding the data sources used, additional benchmarks, and further research on methods to generate and synthesize signals., (Copyright © 2017 Elsevier Inc. All rights reserved.)
- Published
- 2017
- Full Text
- View/download PDF
12. Drug repurposing by integrated literature mining and drug-gene-disease triangulation.
- Author
-
Sun P, Guo J, Winnenburg R, and Baumbach J
- Subjects
- Computational Biology methods, Drug Design, Humans, Research Design, Data Mining methods, Drug Repositioning methods
- Abstract
Drug design is expensive, time-consuming and becoming increasingly complicated. Computational approaches for inferring potentially new purposes of existing drugs, referred to as drug repositioning, play an increasingly important part in current pharmaceutical studies. Here, we first summarize recent developments in computational drug repositioning and introduce the utilized data sources. Afterwards, we introduce a new data fusion model based on n-cluster editing as a novel multi-source triangulation strategy, which was further combined with semantic literature mining. Our evaluation suggests that utilizing drug-gene-disease triangulation coupled to sophisticated text analysis is a robust approach for identifying new drug candidates for repurposing., (Copyright © 2016 Elsevier Ltd. All rights reserved.)
- Published
- 2017
- Full Text
- View/download PDF
13. Interoperability of Medication Classification Systems: Lessons Learned Mapping Established Pharmacologic Classes (EPCs) to SNOMED CT.
- Author
-
Nelson SD, Parker J, Lario R, Winnenburg R, Erlbaum MS, Lincoln MJ, and Bodenreider O
- Subjects
- Humans, Quality Improvement, Medication Systems, Systematized Nomenclature of Medicine
- Abstract
Interoperability among medication classification systems is known to be limited. We investigated the mapping of the Established Pharmacologic Classes (EPCs) to SNOMED CT. We compared lexical and instance-based methods to an expert-reviewed reference standard to evaluate contributions of these methods. Of the 543 EPCs, 284 had an equivalent SNOMED CT class, 205 were more specific, and 54 could not be mapped. Precision, recall, and F1 score were 0.416, 0.620, and 0.498 for lexical mapping and 0.616, 0.504, and 0.554 for instance-based mapping. Each automatic method has strengths, weaknesses, and unique contributions in mapping between medication classification systems. In our experience, it was beneficial to consider the mapping provided by both automated methods for identifying potential matches, gaps, inconsistencies, and opportunities for quality improvement between classifications. However, manual review by subject matter experts is still needed to select the most relevant mappings.
- Published
- 2017
14. Eliciting the Intension of Drug Value Sets - Principles and Quality Assurance Applications.
- Author
-
Bahr NJ, Nelson SD, Winnenburg R, and Bodenreider O
- Subjects
- Humans, Terminology as Topic, Pharmaceutical Preparations, Vocabulary, Controlled
- Abstract
Value sets (VSs) used in electronic clinical quality measures are lists of codes from standard terminologies ("extensional" VSs), whose purpose ("intension") is not always explicitly stated. We elicited the intension for the 09/01/2014 release of extensional medication value sets by comparison to drug classes from the October 2014 release of RxClass. Value sets matched drug classes if they shared common ingredients, as evidenced by Jaccard similarity score. We elicited the intension of 80 extensional value sets. The average Jaccard similarity was 0.65 for single classes and 0.80 for combination classes, with 34% (27/80) of the value sets having high similarity scores. Manual review by a pharmacist indicated 51% (41/80) of the drug classes selected as the best mapping for a value set matched the intension reflected in that value set name. This approach has the potential for facilitating the development and maintenance of medication value sets.
- Published
- 2017
15. The digital revolution in phenotyping.
- Author
-
Oellrich A, Collier N, Groza T, Rebholz-Schuhmann D, Shah N, Bodenreider O, Boland MR, Georgiev I, Liu H, Livingston K, Luna A, Mallon AM, Manda P, Robinson PN, Rustici G, Simon M, Wang L, Winnenburg R, and Dumontier M
- Subjects
- Humans, Information Storage and Retrieval, Research Design, Translational Research, Biomedical, Phenotype
- Abstract
Phenotypes have gained increased notoriety in the clinical and biological domain owing to their application in numerous areas such as the discovery of disease genes and drug targets, phylogenetics and pharmacogenomics. Phenotypes, defined as observable characteristics of organisms, can be seen as one of the bridges that lead to a translation of experimental findings into clinical applications and thereby support 'bench to bedside' efforts. However, to build this translational bridge, a common and universal understanding of phenotypes is required that goes beyond domain-specific definitions. To achieve this ambitious goal, a digital revolution is ongoing that enables the encoding of data in computer-readable formats and the data storage in specialized repositories, ready for integration, enabling translational research. While phenome research is an ongoing endeavor, the true potential hidden in the currently available data still needs to be unlocked, offering exciting opportunities for the forthcoming years. Here, we provide insights into the state-of-the-art in digital phenotyping, by means of representing, acquiring and analyzing phenotype data. In addition, we provide visions of this field for future research work that could enable better applications of phenotype data., (© The Author 2015. Published by Oxford University Press.)
- Published
- 2016
- Full Text
- View/download PDF
16. Generalized enrichment analysis improves the detection of adverse drug events from the biomedical literature.
- Author
-
Winnenburg R and Shah NH
- Subjects
- Animals, Humans, Hypoglycemic Agents therapeutic use, MEDLINE, Pioglitazone, Thiazolidinediones therapeutic use, Urinary Bladder Neoplasms chemically induced, Computational Biology methods, Drug-Related Side Effects and Adverse Reactions, Information Storage and Retrieval, Medical Subject Headings
- Abstract
Background: Identification of associations between marketed drugs and adverse events from the biomedical literature assists drug safety monitoring efforts. Assessing the significance of such literature-derived associations and determining the granularity at which they should be captured remains a challenge. Here, we assess how defining a selection of adverse event terms from MeSH, based on information content, can improve the detection of adverse events for drugs and drug classes., Results: We analyze a set of 105,354 candidate drug adverse event pairs extracted from article indexes in MEDLINE. First, we harmonize extracted adverse event terms by aggregating them into higher-level MeSH terms based on the terms' information content. Then, we determine statistical enrichment of adverse events associated with drug and drug classes using a conditional hypergeometric test that adjusts for dependencies among associated terms. We compare our results with methods based on disproportionality analysis (proportional reporting ratio, PRR) and quantify the improvement in signal detection with our generalized enrichment analysis (GEA) approach using a gold standard of drug-adverse event associations spanning 174 drugs and four events. For single drugs, the best GEA method (Precision: .92/Recall: .71/F1-measure: .80) outperforms the best PRR based method (.69/.69/.69) on all four adverse event outcomes in our gold standard. For drug classes, our GEA performs similarly (.85/.69/.74) when increasing the level of abstraction for adverse event terms. Finally, on examining the 1609 individual drugs in our MEDLINE set, which map to chemical substances in ATC, we find signals for 1379 drugs (10,122 unique adverse event associations) on applying GEA with p < 0.005., Conclusions: We present an approach based on generalized enrichment analysis that can be used to detect associations between drugs, drug classes and adverse events at a given level of granularity, at the same time correcting for known dependencies among events. Our study demonstrates the use of GEA, and the importance of choosing appropriate abstraction levels to complement current drug safety methods. We provide an R package for exploration of alternative abstraction levels of adverse event terms based on information content.
- Published
- 2016
- Full Text
- View/download PDF
17. Feasibility of Prioritizing Drug-Drug-Event Associations Found in Electronic Health Records.
- Author
-
Banda JM, Callahan A, Winnenburg R, Strasberg HR, Cami A, Reis BY, Vilar S, Hripcsak G, Dumontier M, and Shah NH
- Subjects
- Databases, Factual, Drug Interactions, Feasibility Studies, Humans, Adverse Drug Reaction Reporting Systems, Data Mining methods, Drug-Related Side Effects and Adverse Reactions epidemiology, Electronic Health Records statistics & numerical data
- Abstract
Background and Objective: Several studies have demonstrated the ability to detect adverse events potentially related to multiple drug exposure via data mining. However, the number of putative associations produced by such computational approaches is typically large, making experimental validation difficult. We theorized that those potential associations for which there is evidence from multiple complementary sources are more likely to be true, and explored this idea using a published database of drug-drug-adverse event associations derived from electronic health records (EHRs)., Methods: We prioritized drug-drug-event associations derived from EHRs using four sources of information: (1) public databases, (2) sources of spontaneous reports, (3) literature, and (4) non-EHR drug-drug interaction (DDI) prediction methods. After pre-filtering the associations by removing those found in public databases, we devised a ranking for associations based on the support from the remaining sources, and evaluated the results of this rank-based prioritization., Results: We collected information for 5983 putative EHR-derived drug-drug-event associations involving 345 drugs and ten adverse events from four data sources and four prediction methods. Only seven drug-drug-event associations (<0.5 %) had support from the majority of evidence sources, and about one third (1777) had support from at least one of the evidence sources., Conclusions: Our proof-of-concept method for scoring putative drug-drug-event associations from EHRs offers a systematic and reproducible way of prioritizing associations for further study. Our findings also quantify the agreement (or lack thereof) among complementary sources of evidence for drug-drug-event associations and highlight the challenges of developing a robust approach for prioritizing signals of these associations.
- Published
- 2016
- Full Text
- View/download PDF
18. A method for systematic discovery of adverse drug events from clinical notes.
- Author
-
Wang G, Jung K, Winnenburg R, and Shah NH
- Subjects
- Drug-Related Side Effects and Adverse Reactions diagnosis, Humans, Machine Learning, Data Mining methods, Drug-Related Side Effects and Adverse Reactions classification, Electronic Health Records, Product Surveillance, Postmarketing methods
- Abstract
Objective: Adverse drug events (ADEs) are undesired harmful effects resulting from use of a medication, and occur in 30% of hospitalized patients. The authors have developed a data-mining method for systematic, automated detection of ADEs from electronic medical records., Materials and Methods: This method uses the text from 9.5 million clinical notes, along with prior knowledge of drug usages and known ADEs, as inputs. These inputs are further processed into statistics used by a discriminative classifier which outputs the probability that a given drug-disorder pair represents a valid ADE association. Putative ADEs identified by the classifier are further filtered for positive support in 2 independent, complementary data sources. The authors evaluate this method by assessing support for the predictions in other curated data sources, including a manually curated, time-indexed reference standard of label change events., Results: This method uses a classifier that achieves an area under the curve of 0.94 on a held out test set. The classifier is used on 2,362,950 possible drug-disorder pairs comprised of 1602 unique drugs and 1475 unique disorders for which we had data, resulting in 240 high-confidence, well-supported drug-AE associations. Eighty-seven of them (36%) are supported in at least one of the resources that have information that was not available to the classifier., Conclusion: This method demonstrates the feasibility of systematic post-marketing surveillance for ADEs using electronic medical records, a key component of the learning healthcare system., (© The Author 2015. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For Permissions, please email: journals.permissions@oup.com.)
- Published
- 2015
- Full Text
- View/download PDF
19. Leveraging MEDLINE indexing for pharmacovigilance - Inherent limitations and mitigation strategies.
- Author
-
Winnenburg R, Sorbello A, Ripple A, Harpaz R, Tonning J, Szarfman A, Francis H, and Bodenreider O
- Subjects
- Adverse Drug Reaction Reporting Systems, Data Mining, Humans, Information Storage and Retrieval, United States, United States Food and Drug Administration, Drug-Related Side Effects and Adverse Reactions, MEDLINE, Medical Subject Headings, Pharmacovigilance
- Abstract
Background: Traditional approaches to pharmacovigilance center on the signal detection from spontaneous reports, e.g., the U.S. Food and Drug Administration (FDA) adverse event reporting system (FAERS). In order to enrich the scientific evidence and enhance the detection of emerging adverse drug events that can lead to unintended harmful outcomes, pharmacovigilance activities need to evolve to encompass novel complementary data streams, for example the biomedical literature available through MEDLINE., Objectives: (1) To review how the characteristics of MEDLINE indexing influence the identification of adverse drug events (ADEs); (2) to leverage this knowledge to inform the design of a system for extracting ADEs from MEDLINE indexing; and (3) to assess the specific contribution of some characteristics of MEDLINE indexing to the performance of this system., Methods: We analyze the characteristics of MEDLINE indexing. We integrate three specific characteristics into the design of a system for extracting ADEs from MEDLINE indexing. We experimentally assess the specific contribution of these characteristics over a baseline system based on co-occurrence between drug descriptors qualified by adverse effects and disease descriptors qualified by chemically induced., Results: Our system extracted 405,300 ADEs from 366,120 MEDLINE articles. The baseline system accounts for 297,093 ADEs (73%). 85,318 ADEs (21%) can be extracted only after integrating specific pre-coordinated MeSH descriptors and additional qualifiers. 22,889 ADEs (6%) can be extracted only after considering indirect links between the drug of interest and the descriptor that bears the ADE context., Conclusions: In this paper, we demonstrate significant improvement over a baseline approach to identifying ADEs from MEDLINE indexing, which mitigates some of the inherent limitations of MEDLINE indexing for pharmacovigilance. ADEs extracted from MEDLINE indexing are complementary to, not a replacement for, other sources., (Published by Elsevier Inc.)
- Published
- 2015
- Full Text
- View/download PDF
20. Exploring adverse drug events at the class level.
- Author
-
Winnenburg R, Sorbello A, and Bodenreider O
- Abstract
Background: While the association between a drug and an adverse event (ADE) is generally detected at the level of individual drugs, ADEs are often discussed at the class level, i.e., at the level of pharmacologic classes (e.g., in drug labels). We propose two approaches, one visual and one computational, to exploring the contribution of individual drugs to the class signal., Methods: Having established a dataset of ADEs from MEDLINE, we aggregate drugs into ATC classes and ADEs into high-level MeSH terms. We compute statistical associations between drugs and ADEs at the drug level and at the class level. Finally, we visualize the signals at increasing levels of resolution using heat maps. We also automate the exploration of drug-ADE associations at the class level using clustering techniques., Results: Using our visual approach, we were able to uncover known associations, e.g., between fluoroquinolones and tendon injuries, and between statins and rhabdomyolysis. Using our computational approach, we systematically analyzed 488 associations between a drug class and an ADE., Conclusions: The findings gained from our exploratory techniques should be of interest to the curators of ADE repositories and drug safety professionals. Our approach can be applied to different drug-ADE datasets, using different drug classification systems and different signal detection algorithms.
- Published
- 2015
- Full Text
- View/download PDF
21. Using description logics to evaluate the consistency of drug-class membership relations in NDF-RT.
- Author
-
Winnenburg R, Mortensen JM, and Bodenreider O
- Abstract
Background: The NDF-RT (National Drug File Reference Terminology) is an ontology, which describes drugs and their properties and supports computerized physician order entry systems. NDF-RT's classes are mostly specified using only necessary conditions and lack sufficient conditions, making its use limited until recently, when asserted drug-class relations were added. The addition of these asserted drug-class relations presents an opportunity to compare them with drug-class relations that can be inferred using the properties of drugs and drug classes in NDF-RT., Methods: We enriched NDF-RT's drug-classes with sufficient conditions, added property equivalences, and then used an OWL reasoner to infer drug-class membership relations. We compared the inferred class relations to the recently added asserted relations derived from FDA Structured Product Labels., Results: The inferred and asserted relations only match in about 50% of the cases, due to incompleteness of the drug descriptions and quality issues in the class definitions., Conclusions: This investigation quantifies and categorizes the disparities between asserted and inferred drug-class relations and illustrates issues with class definitions and drug descriptions. In addition, it serves as an example of the benefits DL can add to ontology development and evaluation.
- Published
- 2015
- Full Text
- View/download PDF
22. Extending the coverage of phenotypes in SNOMED CT through post-coordination.
- Author
-
Dhombres F, Winnenburg R, Case JT, and Bodenreider O
- Subjects
- Humans, Information Storage and Retrieval methods, Unified Medical Language System, Phenotype, Systematized Nomenclature of Medicine
- Abstract
Objectives: To extend the coverage of phenotypes in SNOMED CT through post-coordination., Methods: We identify frequent modifiers in terms from the Human Phenotype Ontology (HPO), which we associate with templates for post-coordinated expressions in SNOMED CT., Results: We identified 176 modifiers, created 12 templates, and generated 1,617 post-coordinated expressions., Conclusions: Through this novel approach, we can increase the current number of mappings by 50%.
- Published
- 2015
23. Desiderata for an authoritative Representation of MeSH in RDF.
- Author
-
Winnenburg R and Bodenreider O
- Subjects
- MEDLINE, Semantics, Datasets as Topic, Internet standards, Medical Subject Headings
- Abstract
The Semantic Web provides a framework for the integration of resources on the web, which facilitates information integration and interoperability. RDF is the main representation format for Linked Open Data (LOD). However, datasets are not always made available in RDF by their producers and the Semantic Web community has had to convert some of these datasets to RDF in order for these datasets to participate in the LOD cloud. As a result, the LOD cloud sometimes contains outdated, partial and even inaccurate RDF datasets. We review the LOD landscape for one of these resources, MeSH, and analyze the characteristics of six existing representations in order to identify desirable features for an authoritative version, for which we create a prototype. We illustrate the suitability of this prototype on three common use cases. NLM intends to release an authoritative representation of MeSH in RDF (beta version) in the Fall of 2014.
- Published
- 2014
24. A time-indexed reference standard of adverse drug reactions.
- Author
-
Harpaz R, Odgers D, Gaskin G, DuMouchel W, Winnenburg R, Bodenreider O, Ripple A, Szarfman A, Sorbello A, Horvitz E, White RW, and Shah NH
- Subjects
- Data Mining, Drug Evaluation standards, Drug Labeling standards, Humans, MEDLINE, Reference Standards, Time Factors, United States, United States Food and Drug Administration, Adverse Drug Reaction Reporting Systems standards, Drug-Related Side Effects and Adverse Reactions
- Abstract
Undetected adverse drug reactions (ADRs) pose a major burden on the health system. Data mining methodologies designed to identify signals of novel ADRs are of deep importance for drug safety surveillance. The development and evaluation of these methodologies requires proper reference benchmarks. While progress has recently been made in developing such benchmarks, our understanding of the performance characteristics of the data mining methodologies is limited because existing benchmarks do not support prospective performance evaluations. We address this shortcoming by providing a reference standard to support prospective performance evaluations. The reference standard was systematically curated from drug labeling revisions, such as new warnings, which were issued and communicated by the US Food and Drug Administration in 2013. The reference standard includes 62 positive test cases and 75 negative controls, and covers 44 drugs and 38 events. We provide usage guidance and empirical support for the reference standard by applying it to analyze two data sources commonly mined for drug safety surveillance.
- Published
- 2014
- Full Text
- View/download PDF
25. A framework for assessing the consistency of drug classes across sources.
- Author
-
Winnenburg R and Bodenreider O
- Abstract
Background: The objective of this study is to develop a framework for assessing the consistency of drug classes across sources, such as MeSH and ATC. Our framework integrates and contrasts lexical and instance-based ontology alignment techniques. Moreover, we propose metrics for assessing not only equivalence relations, but also inclusion relations among drug classes., Results: We identified 226 equivalence relations between MeSH and ATC classes through the lexical alignment, and 223 through the instance-based alignment, with limited overlap between the two (36). We also identified 6,257 inclusion relations. Discrepancies between lexical and instance-based alignments are illustrated and discussed., Conclusions: Our work is the first attempt to align drug classes with sophisticated instance-based techniques, while also distinguishing between equivalence and inclusion relations. Additionally, it is the first application of aligning drug classes in ATC and MeSH. By providing a detailed account of similarities and differences between drug classes across sources, our framework has the prospect of effectively supporting the creation of a mapping of drug classes between ATC and MeSH by domain experts.
- Published
- 2014
- Full Text
- View/download PDF
26. Metrics for assessing the quality of value sets in clinical quality measures.
- Author
-
Winnenburg R and Bodenreider O
- Subjects
- Electronic Health Records, International Classification of Diseases, National Library of Medicine (U.S.), Systematized Nomenclature of Medicine, Unified Medical Language System standards, United States, Quality Indicators, Health Care, Vocabulary, Controlled
- Abstract
Objective: To assess the quality of value sets in clinical quality measures, both individually and as a population of value sets., Materials and Methods: The concepts from a given value set are expected to be rooted by one or few ancestor concepts and the value set is expected to contain all the descendants of its root concepts and only these descendants. (1) We assessed the completeness and correctness of individual value sets by comparison to the extension derived from their roots. (2) We assessed the non-redundancy of value sets for the entire population of value sets (within a given code system) using the Jaccard similarity measure., Results: We demonstrated the utility of our approach on some cases of inconsistent value sets and produced a list of 58 potentially duplicate value sets from the current set of clinical quality measures for the 2014 Meaningful Use criteria., Conclusion: These metrics are easy to compute and provide compact indicators of the completeness, correctness, and non-redundancy of value sets.
- Published
- 2013
27. Exploring pharmacoepidemiologic groupings of drugs from a clinical perspective.
- Author
-
Winnenburg R and Bodenreider O
- Subjects
- Databases, Pharmaceutical, Dictionaries, Pharmaceutic as Topic, Natural Language Processing, Pharmaceutical Preparations classification, Pharmacoepidemiology methods, Terminology as Topic, Vocabulary, Controlled
- Abstract
Objectives: To investigate the extent to which pharmacoepidemiologic groupings are homogeneous in terms of clinical properties., Methods: In our analysis, we classified drug subgroups from the pharmacoepidemiologic Anatomical Therapeutic Chemical (ATC) classification system based on clinical drug properties. We established mappings from ATC fifth level drug entities to drug property annotations in the National Drug File Reference Terminology (NDF-RT), including therapeutic categories, mechanisms of action, and physiologic effects. Based on the annotations for the individual drugs we computed homogeneity scores for all ATC groups and analyzed their distribution., Conclusions: We found ATC groups to be generally homogeneous, more so for mechanisms of action, and physiologic effects than for therapeutic intent. However, only half of all ATC drugs can be analyzed with this approach, in part because of missing properties in NDF-RT.
- Published
- 2013
28. The NLM value set authority center.
- Author
-
Bodenreider O, Nguyen D, Chiang P, Chuang P, Madden M, Winnenburg R, McClure R, Emrick S, and D'Souza I
- Subjects
- Quality Control, Reference Standards, United States, Data Mining standards, Databases, Factual standards, National Library of Medicine (U.S.) standards, Terminology as Topic, User-Computer Interface, Vocabulary, Controlled
- Abstract
The Value Set Authority Center (VSAC) at the National Library of Medicine (NLM) provides downloadable access to all official versions of vocabulary value sets contained in the Clinical Quality Measures (CQMs) used in the certification criteria for electronic health record systems ("Meaningful Use" incentive program). Each value set consists of the numerical values (codes) and human-readable names (descriptions), drawn from standard vocabularies such as LOINC, RxNorm and SNOMED CT®, that are used to define clinical data elements used in clinical quality measures (e.g., patients with diabetes, tricyclic antidepressants). The content of the VSAC will gradually expand to incorporate value sets for other use cases, as well as for new measures and updates to existing measures.
- Published
- 2013
29. Issues in creating and maintaining value sets for clinical quality measures.
- Author
-
Winnenburg R and Bodenreider O
- Subjects
- RxNorm, Systematized Nomenclature of Medicine, Unified Medical Language System, Quality Assurance, Health Care, Vocabulary, Controlled
- Abstract
Objective: To develop methods for assessing the validity, consistency and currency of value sets for clinical quality measures, in order to support the developers of quality measures in which such value sets are used., Methods: We assessed the well-formedness of the codes (in a given code system), the existence and currency of the codes in the corresponding code system, using the UMLS and RxNorm terminology services. We also investigated the overlap among value sets using the Jaccard similarity measure., Results: We extracted 163,788 codes (76,062 unique codes) from 1463 unique value sets in the 113 quality measures published by the National Quality Forum (NQF) in December 2011. Overall, 5% of the codes are invalid (4% of the unique codes). We also found 67 duplicate value sets and 10 pairs of value sets exhibiting a high degree of similarity (Jaccard > .9)., Conclusion: Invalid codes affect a large proportion of the value sets (19%). 79% of the quality Measures have at least one value set exhibiting errors. However, 50% of the quality measures exhibit errors in less than 10 % of their value sets. The existence of duplicate and highly-similar value sets suggests the need for an authoritative repository of value sets and related tooling in order to support the development of quality measures.
- Published
- 2012
30. MeMotif: a database of linear motifs in alpha-helical transmembrane proteins.
- Author
-
Marsico A, Scheubert K, Tuukkanen A, Henschel A, Winter C, Winnenburg R, and Schroeder M
- Subjects
- Bacterial Proteins chemistry, Computational Biology trends, Databases, Protein, Information Storage and Retrieval methods, Internet, Protein Structure, Secondary, Protein Structure, Tertiary, Software, Amino Acid Motifs, Computational Biology methods, Databases, Genetic, Databases, Nucleic Acid, Membrane Proteins chemistry
- Abstract
Membrane proteins are important for many processes in the cell and used as main drug targets. The increasing number of high-resolution structures available makes for the first time a characterization of local structural and functional motifs in alpha-helical transmembrane proteins possible. MeMotif (http://projects.biotec.tu-dresden.de/memotif) is a database and wiki which collects more than 2000 known and novel computationally predicted linear motifs in alpha-helical transmembrane proteins. Motifs are fully described in terms of several structural and functional features and editable. Motifs contained in MeMotif can be used in different biological applications, from the identification of biochemically important functional residues which are candidates for mutagenesis experiments to the improvement of tools for transmembrane protein modeling.
- Published
- 2010
- Full Text
- View/download PDF
31. SuperCYP: a comprehensive database on Cytochrome P450 enzymes including a tool for analysis of CYP-drug interactions.
- Author
-
Preissner S, Kroll K, Dunkel M, Senger C, Goldsobel G, Kuzman D, Guenther S, Winnenburg R, Schroeder M, and Preissner R
- Subjects
- Animals, Computational Biology trends, Drug Interactions genetics, Humans, Information Storage and Retrieval methods, Internet, Polymorphism, Genetic, Protein Structure, Tertiary, Software, Computational Biology methods, Cytochrome P-450 Enzyme System chemistry, Cytochrome P-450 Enzyme System genetics, Databases, Genetic, Databases, Nucleic Acid, Databases, Protein, Drug Interactions physiology
- Abstract
Much of the information on the Cytochrome P450 enzymes (CYPs) is spread across literature and the internet. Aggregating knowledge about CYPs into one database makes the search more efficient. Text mining on 57 CYPs and drugs led to a mass of papers, which were screened manually for facts about metabolism, SNPs and their effects on drug degradation. Information was put into a database, which enables the user not only to look up a particular CYP and all metabolized drugs, but also to check tolerability of drug-cocktails and to find alternative combinations, to use metabolic pathways more efficiently. The SuperCYP database contains 1170 drugs with more than 3800 interactions including references. Approximately 2000 SNPs and mutations are listed and ordered according to their effect on expression and/or activity. SuperCYP (http://bioinformatics.charite.de/supercyp) is a comprehensive resource focused on CYPs and drug metabolism. Homology-modeled structures of the CYPs can be downloaded in PDB format and related drugs are available as MOL-files. Within the resource, CYPs can be aligned with each other, drug-cocktails can be 'mixed', SNPs, protein point mutations, and their effects can be viewed and corresponding PubMed IDs are given. SuperCYP is meant to be a platform and a starting point for scientists and health professionals for furthering their research.
- Published
- 2010
- Full Text
- View/download PDF
32. Improved mutation tagging with gene identifiers applied to membrane protein stability prediction.
- Author
-
Winnenburg R, Plake C, and Schroeder M
- Subjects
- Algorithms, Amino Acid Substitution, Animals, Databases, Genetic, Genes, Genomics, Humans, Membrane Proteins chemistry, Models, Genetic, Pattern Recognition, Automated, Periodicals as Topic, Phenotype, Point Mutation, Protein Stability, PubMed, Sequence Analysis, Computational Biology methods, Information Storage and Retrieval methods, Membrane Proteins genetics, Mutation
- Abstract
Background: The automated retrieval and integration of information about protein point mutations in combination with structure, domain and interaction data from literature and databases promises to be a valuable approach to study structure-function relationships in biomedical data sets., Results: We developed a rule- and regular expression-based protein point mutation retrieval pipeline for PubMed abstracts, which shows an F-measure of 87% for the mutation retrieval task on a benchmark dataset. In order to link mutations to their proteins, we utilize a named entity recognition algorithm for the identification of gene names co-occurring in the abstract, and establish links based on sequence checks. Vice versa, we could show that gene recognition improved from 77% to 91% F-measure when considering mutation information given in the text. To demonstrate practical relevance, we utilize mutation information from text to evaluate a novel solvation energy based model for the prediction of stabilizing regions in membrane proteins. For five G protein-coupled receptors we identified 35 relevant single mutations and associated phenotypes, of which none had been annotated in the UniProt or PDB database. In 71% reported phenotypes were in compliance with the model predictions, supporting a relation between mutations and stability issues in membrane proteins., Conclusion: We present a reliable approach for the retrieval of protein mutations from PubMed abstracts for any set of genes or proteins of interest. We further demonstrate how amino acid substitution information from text can be utilized for protein structure stability studies on the basis of a novel energy model.
- Published
- 2009
- Full Text
- View/download PDF
33. PHI-base update: additions to the pathogen host interaction database.
- Author
-
Winnenburg R, Urban M, Beacham A, Baldwin TK, Holland S, Lindeberg M, Hansen H, Rawlings C, Hammond-Kosack KE, and Köhler J
- Subjects
- Anti-Infective Agents pharmacology, Bacteria genetics, Fungi genetics, Genes, Bacterial, Genes, Fungal, Internet, Oomycetes genetics, User-Computer Interface, Virulence Factors antagonists & inhibitors, Bacteria pathogenicity, Databases, Genetic, Fungi pathogenicity, Host-Pathogen Interactions genetics, Oomycetes pathogenicity, Virulence Factors genetics
- Abstract
The pathogen-host interaction database (PHI-base) is a web-accessible database that catalogues experimentally verified pathogenicity, virulence and effector genes from bacterial, fungal and Oomycete pathogens, which infect human, animal, plant, insect, fish and fungal hosts. Plant endophytes are also included. PHI-base is therefore an invaluable resource for the discovery of genes in medically and agronomically important pathogens, which may be potential targets for chemical intervention. The database is freely accessible to both academic and non-academic users. This publication describes recent additions to the database and both current and future applications. The number of fields that characterize PHI-base entries has almost doubled. Important additional fields deal with new experimental methods, strain information, pathogenicity islands and external references that link the database to external resources, for example, gene ontology terms and Locus IDs. Another important addition is the inclusion of anti-infectives and their target genes that makes it possible to predict the compounds, that may interact with newly identified virulence factors. In parallel, the curation process has been improved and now involves several external experts. On the technical side, several new search tools have been provided and the database is also now distributed in XML format. PHI-base is available at: http://www.phi-base.org/.
- Published
- 2008
- Full Text
- View/download PDF
34. The pathogen-host interactions database (PHI-base) provides insights into generic and novel themes of pathogenicity.
- Author
-
Baldwin TK, Winnenburg R, Urban M, Rawlings C, Koehler J, and Hammond-Kosack KE
- Subjects
- Computational Biology methods, Fungi genetics, Oomycetes genetics, Virulence genetics, Databases, Genetic, Fungi pathogenicity, Oomycetes pathogenicity, Plants microbiology, Plants parasitology
- Abstract
Fungal and oomycete pathogens of plants and animals are a major global problem. In the last 15 years, many genes required for pathogenesis have been determined for over 50 different species. Other studies have characterized effector genes (previously termed avirulence genes) required to activate host responses. By studying these types of pathogen genes, novel targets for control can be revealed. In this report, we describe the Pathogen-Host Interactions database (PHI-base), which systematically compiles such pathogenicity genes involved in pathogen-host interactions. Here, we focus on the biology that underlies this computational resource: the nature of pathogen-host interactions, the experimental methods that exist for the characterization of such pathogen-host interactions as well as the available computational resources. Based on the data, we review and analyze the specific functions of pathogenicity genes, the host-specific nature of pathogenicity and virulence genes, and the generic mechanisms of effectors that trigger plant responses. We further discuss the utilization of PHI-base for the computational identification of pathogenicity genes through comparative genomics. In this context, the importance of standardizing pathogenicity assays as well as integrating databases to aid comparative genomics is discussed.
- Published
- 2006
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.