Back to Search Start Over

Curation tools for taxonomic databases

Authors :
Morse, David
De Roeck, Anne
Willis, Alistair
Yang, Hui
Source :
BioCuration 2013
Publication Year :
2013

Abstract

Biological taxonomy is the classification of living and fossil organisms. Taxonomists have identified and named some 1.8 million species of animals, plants, and microorganisms, a fraction of Earth's estimated 5‐30 million species. Part of the effort of taxonomy lies in developing and curating taxonomic databases, which support access to the taxonomy literature, and provide basic knowledge needed for management and conservation of biodiversity.\ud \ud A major difficulty facing this task is incorporating knowledge that is currently contained only in the historical literature. Extracting this knowledge is a difficult and labour‐intensive process, as scanning errors and other variations in nomenclature mean that particular names must be manually verified as part of the process. For example, Actinobacillus actionomy, Actinobacillus actionomyce, and Actinobacillus actionomycetam could all be variants of the same name. ComTax is an ongoing project to develop a community‐driven curation process among taxonomists, by providing tools to help them identify and validate taxonomic names from the scanned historical literature. The system operates on scanned documents following optical character recognition (OCR). The key stages are:\ud \ud 1. Identify possible taxonomic names from scanned text. Names might be new either because they do not appear in existing databases, or because they have been incorrectly identified by OCR.\ud \ud 2. Present the proposed name to a domain expert for validation or correction.\ud \ud 3. Present validated taxonomic names for curation. Organizations like Global Biodiversity Information Facility (GBIF) manage the curation of taxonomic databases. This poster describes the technical challenges facing the ComTax project, and will discuss our work with the Natural History Museum to integrate the curation process into taxonomists' workflow. We also demonstrate the relevance of this work within the wider context of biological taxonomy curation.

Details

Language :
English
Database :
OpenAIRE
Journal :
BioCuration 2013
Accession number :
edsair.core.ac.uk....2799ba9b1a93f14970244219367540c4