Back to Search
Start Over
Extending ontologies by finding siblings using set expansion techniques
- Source :
- Bioinformatics
- Publication Year :
- 2012
-
Abstract
- Motivation: Ontologies are an everyday tool in biomedicine to capture and represent knowledge. However, many ontologies lack a high degree of coverage in their domain and need to improve their overall quality and maturity. Automatically extending sets of existing terms will enable ontology engineers to systematically improve text-based ontologies level by level. Results: We developed an approach to extend ontologies by discovering new terms which are in a sibling relationship to existing terms of an ontology. For this purpose, we combined two approaches which retrieve new terms from the web. The first approach extracts siblings by exploiting the structure of HTML documents, whereas the second approach uses text mining techniques to extract siblings from unstructured text. Our evaluation against MeSH (Medical Subject Headings) shows that our method for sibling discovery is able to suggest first-class ontology terms and can be used as an initial step towards assessing the completeness of ontologies. The evaluation yields a recall of 80% at a precision of 61% where the two independent approaches are complementing each other. For MeSH in particular, we show that it can be considered complete in its medical focus area. We integrated the work into DOG4DAG, an ontology generation plugin for the editors OBO-Edit and Protégé, making it the first plugin that supports sibling discovery on-the-fly. Availability: Sibling discovery for ontology is available as part of DOG4DAG (www.biotec.tu-dresden.de/research/schroeder/dog4dag) for both Protégé 4.1 and OBO-Edit 2.1. Contact: ms@biotec.tu-dresden.de; goetz.fabian@biotec.tu-dresden.de Supplementary information: Supplementary data are available at Bioinformatics online.
- Subjects :
- Statistics and Probability
Computer science
Process ontology
Databases and Ontologies
Ismb 2012 Proceedings Papers Committee July 15 to July 19, 2012, Long Beach, Ca, Usa
02 engineering and technology
Ontology (information science)
Biochemistry
03 medical and health sciences
Medical Subject Headings
Text mining
Ontology components
Terminology as Topic
0202 electrical engineering, electronic engineering, information engineering
Data Mining
Molecular Biology
Biomedicine
030304 developmental biology
0303 health sciences
Internet
Information retrieval
business.industry
Protégé
Original Papers
Computer Science Applications
Computational Mathematics
Computational Theory and Mathematics
Ontology
020201 artificial intelligence & image processing
The Internet
business
Algorithms
Subjects
Details
- ISSN :
- 13674811
- Volume :
- 28
- Issue :
- 12
- Database :
- OpenAIRE
- Journal :
- Bioinformatics (Oxford, England)
- Accession number :
- edsair.doi.dedup.....a458512aca3b7bd2c72b2c448767841f