1. GeDex: A consensus Gene-disease Event Extraction System based on frequency patterns and supervised learning
- Author
-
Yalbi I. Balderas-Martínez, Soto Lm, Julio Collado-Vides, Velázquez-Ramírez Da, Adrián Munguía-Reyes, Olayo-Alarcón R, and Carlos-Francisco Méndez-Cruz
- Subjects
Computer science ,Event (computing) ,business.industry ,Supervised learning ,Simple Features ,Literature based ,Disease ,Artificial intelligence ,Machine learning ,computer.software_genre ,business ,computer ,Manual curation - Abstract
MotivationThe genetic mechanisms involved in human diseases are fundamental in biomedical research. Several databases with curated associations between genes and diseases have emerged in the last decades. Although, due to the demanding and time consuming nature of manual curation of literature, they still lack large amounts of information. Current automatic approaches extract associations by considering each abstract or sentence independently. This approach could potentially lead to contradictions between individual cases. Therefore, there is a current need for automatic strategies that can provide a literature consensus of gene-disease associations, and are not prone to making contradictory predictions.ResultsHere, we present GeDex, an effective and freely available automatic approach to extract consensus gene-disease associations from biomedical literature based on a predictive model trained with four simple features. As far as we know, it is the only system that reports a single consensus prediction from multiple sentences supporting the same association. We tested our approach on the curated fraction of DisGeNet (f-score 0.77) and validated it on a manually curated dataset, obtaining a competitive performance when compared to pre-existing methods (f-score 0.74). In addition, we effectively recovered associations from an article collection of chronic pulmonary diseases, and discovered that a large proportion is not reported in current databases. Our results demonstrate that GeDex, despite its simplicity, is a competitive tool that can successfully assist the curation of existing databases.AvailabilityGeDex is available at https://bitbucket.org/laigen/gedex/src/master/ and can be used as a docker image https://hub.docker.com/r/laigen/gedexContactcmendezc@ccg.unam.mxSupplementary informationSupplementary material are available at bioRxiv online.
- Published
- 2019
- Full Text
- View/download PDF