Back to Search
Start Over
RENET2: high-performance full-text gene–disease relation extraction with iterative training data expansion
- Source :
- NAR Genomics and Bioinformatics
- Publication Year :
- 2021
- Publisher :
- Oxford University Press (OUP), 2021.
-
Abstract
- BackgroundRelation extraction is a fundamental task for extracting gene-disease associations from biomedical text. Existing tools have limited capacity, as they can extract gene-disease associations only from single sentences or abstract texts.ResultsIn this work, we propose RENET2, a deep learning-based relation extraction method, which implements section filtering and ambiguous relations modeling to extract gene-disease associations from full-text articles. We designed a novel iterative training data expansion strategy to build an annotated full-text dataset to resolve the scarcity of labels on full-text articles. In our experiments, RENET2 achieved an F1-score of 72.13% for extracting gene-disease associations from an annotated full-text dataset, which was 27.22%, 30.30% and 29.24% higher than the best existing tools BeFree, DTMiner and BioBERT, respectively. We applied RENET2 to (1) ~1.89M full-text articles from PMC and found ~3.72M gene-disease associations; and (2) the LitCovid articles set and ranked the top 15 proteins associated with COVID-19, supported by recent articles.ConclusionRENET2 is an efficient and accurate method for full-text gene-disease association extraction. The source-code, manually curated abstract/full-text training data, and results of RENET2 are available at https://github.com/sujunhao/RENET2.
- Subjects :
- AcademicSubjects/SCI01140
AcademicSubjects/SCI01060
Computer science
Existential quantification
AcademicSubjects/SCI00030
MEDLINE
AcademicSubjects/SCI01180
computer.software_genre
Task (project management)
Set (abstract data type)
03 medical and health sciences
0302 clinical medicine
Text mining
Limited capacity
Association (psychology)
030304 developmental biology
0303 health sciences
Training set
business.industry
Deep learning
Relationship extraction
Methart
Task (computing)
Biomedical text
030220 oncology & carcinogenesis
AcademicSubjects/SCI00980
Artificial intelligence
business
computer
Natural language processing
Subjects
Details
- ISSN :
- 26319268
- Volume :
- 3
- Database :
- OpenAIRE
- Journal :
- NAR Genomics and Bioinformatics
- Accession number :
- edsair.doi.dedup.....9a0b20db4b93cdef1bd8cc159d9645c0
- Full Text :
- https://doi.org/10.1093/nargab/lqab062