Back to Search Start Over

Evaluating a variety of text-mined features for automatic protein function prediction

Authors :
Shah, N
Dumontier, M
Soldatova, L
Rocca-Serra, P
Funk, C
Kahanda, I
Ben-Hur, A
VERSPOOR, CM
Shah, N
Dumontier, M
Soldatova, L
Rocca-Serra, P
Funk, C
Kahanda, I
Ben-Hur, A
VERSPOOR, CM
Publication Year :
2015

Abstract

Most computational methods that predict protein function do not take advantage of the large amount of information contained in the biomedical literature. In this work we evaluate both ontology term co-mention and bag-of-words features mined from the biomedical literature and analyze their impact in the context of a structured output support vector machine model, GOstruct. We find that even simple literature based features are useful for predicting human protein function (F-max: Molecular Function =0.408, Biological Process =0.461, Cellular Component =0.608). One advantage of using literature features is their ability to offer easy verification of automated predictions. We find through manual inspection of misclassifications that some false positive predictions could be biologically valid predictions based upon support extracted from the literature. Additionally, we present a "medium-throughput" pipeline that was used to annotate a large subset of co-mentions; we suggest that this strategy could help to speed up the rate at which proteins are curated.

Details

Database :
OAIster
Publication Type :
Electronic Resource
Accession number :
edsoai.on1315723916
Document Type :
Electronic Resource