Back to Search Start Over

RENET2: high-performance full-text gene–disease relation extraction with iterative training data expansion

Authors :
Ruibang Luo
Junhao Su
Ye Wu
Tak-Wah Lam
Hing-Fung Ting
Source :
NAR Genomics and Bioinformatics
Publication Year :
2021
Publisher :
Oxford University Press (OUP), 2021.

Abstract

BackgroundRelation extraction is a fundamental task for extracting gene-disease associations from biomedical text. Existing tools have limited capacity, as they can extract gene-disease associations only from single sentences or abstract texts.ResultsIn this work, we propose RENET2, a deep learning-based relation extraction method, which implements section filtering and ambiguous relations modeling to extract gene-disease associations from full-text articles. We designed a novel iterative training data expansion strategy to build an annotated full-text dataset to resolve the scarcity of labels on full-text articles. In our experiments, RENET2 achieved an F1-score of 72.13% for extracting gene-disease associations from an annotated full-text dataset, which was 27.22%, 30.30% and 29.24% higher than the best existing tools BeFree, DTMiner and BioBERT, respectively. We applied RENET2 to (1) ~1.89M full-text articles from PMC and found ~3.72M gene-disease associations; and (2) the LitCovid articles set and ranked the top 15 proteins associated with COVID-19, supported by recent articles.ConclusionRENET2 is an efficient and accurate method for full-text gene-disease association extraction. The source-code, manually curated abstract/full-text training data, and results of RENET2 are available at https://github.com/sujunhao/RENET2.

Details

ISSN :
26319268
Volume :
3
Database :
OpenAIRE
Journal :
NAR Genomics and Bioinformatics
Accession number :
edsair.doi.dedup.....9a0b20db4b93cdef1bd8cc159d9645c0
Full Text :
https://doi.org/10.1093/nargab/lqab062