Back to Search Start Over

Sentence identification of biological interactions using PATRICIA tree generated patterns and genetic algorithm optimized parameters

Authors :
Liu, Haibin
Blouin, Christian
Kešelj, Vlado
Source :
Data & Knowledge Engineering. Jan2010, Vol. 69 Issue 1, p137-152. 16p.
Publication Year :
2010

Abstract

Abstract: An important task in information retrieval is to identify sentences that contain important relationships between key concepts. In this work, we propose a novel approach to automatically extract sentence patterns that contain interactions involving concepts of molecular biology. A pattern is defined in this work as a sequence of specialized Part-of-Speech (POS) tags that capture the structure of key sentences in the scientific literature. Each candidate sentence for the classification task is encoded as a POS array and then aligned to a collection of pre-extracted patterns. The quality of the alignment is expressed as a pairwise alignment score. The most innovative component of this work is the use of a genetic algorithm (GA) to maximize the classification performance of the alignment scoring scheme. The system achieves an average F-score of 0.796 in identifying sentences which describe interactions between co-occurring biological concepts. This performance is mostly affected by the quality of the preprocessing steps such as term identification and POS tagging. [Copyright &y& Elsevier]

Details

Language :
English
ISSN :
0169023X
Volume :
69
Issue :
1
Database :
Academic Search Index
Journal :
Data & Knowledge Engineering
Publication Type :
Academic Journal
Accession number :
45544537
Full Text :
https://doi.org/10.1016/j.datak.2009.09.002