Back to Search Start Over

Enhancing the coverage of SemRep using a relation classification approach.

Authors :
Ming S
Zhang R
Kilicoglu H
Source :
Journal of biomedical informatics [J Biomed Inform] 2024 Jul; Vol. 155, pp. 104658. Date of Electronic Publication: 2024 May 21.
Publication Year :
2024

Abstract

Objective: Relation extraction is an essential task in the field of biomedical literature mining and offers significant benefits for various downstream applications, including database curation, drug repurposing, and literature-based discovery. The broad-coverage natural language processing (NLP) tool SemRep has established a solid baseline for extracting subject-predicate-object triples from biomedical text and has served as the backbone of the Semantic MEDLINE Database (SemMedDB), a PubMed-scale repository of semantic triples. While SemRep achieves reasonable precision (0.69), its recall is relatively low (0.42). In this study, we aimed to enhance SemRep using a relation classification approach, in order to eventually increase the size and the utility of SemMedDB.<br />Methods: We combined and extended existing SemRep evaluation datasets to generate training data. We leveraged the pre-trained PubMedBERT model, enhancing it through additional contrastive pre-training and fine-tuning. We experimented with three entity representations: mentions, semantic types, and semantic groups. We evaluated the model performance on a portion of the SemRep Gold Standard dataset and compared it to SemRep performance. We also assessed the effect of the model on a larger set of 12K randomly selected PubMed abstracts.<br />Results: Our results show that the best model yields a precision of 0.62, recall of 0.81, and F <subscript>1</subscript> score of 0.70. Assessment on 12K abstracts shows that the model could double the size of SemMedDB, when applied to entire PubMed. We also manually assessed the quality of 506 triples predicted by the model that SemRep had not previously identified, and found that 67% of these triples were correct.<br />Conclusion: These findings underscore the promise of our model in achieving a more comprehensive coverage of relationships mentioned in biomedical literature, thereby showing its potential in enhancing various downstream applications of biomedical literature mining. Data and code related to this study are available at https://github.com/Michelle-Mings/SemRep_RelationClassification.<br />Competing Interests: Declaration of competing interest The authors declare the following financial interests/personal relationships which may be considered as potential competing interests: Halil Kilicoglu reports financial support was provided by National Library of Medicine. If there are other authors, they declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.<br /> (Copyright © 2024 The Author(s). Published by Elsevier Inc. All rights reserved.)

Details

Language :
English
ISSN :
1532-0480
Volume :
155
Database :
MEDLINE
Journal :
Journal of biomedical informatics
Publication Type :
Academic Journal
Accession number :
38782169
Full Text :
https://doi.org/10.1016/j.jbi.2024.104658