Back to Search
Start Over
A disease inference method based on symptom extraction and bidirectional Long Short Term Memory networks
- Source :
- Methods (San Diego, Calif.). 173
- Publication Year :
- 2019
-
Abstract
- The wide applications of automatic disease inference in many medical fields improve the efficiency of medical treatments. Many efforts have been made to predict patients' future health conditions according to their full clinical texts, clinical measurements or medical codes. Symptoms reflect the onset of diseases and can provide credible information for disease diagnosis. In this study, we propose a new disease inference method by extracting symptoms and integrating two symptom representation approaches. To reduce the uncertainty and irregularity of symptom descriptions in Electronic Medical Records (EMR), a comprehensive clinical knowledge database consisting of massive amount of data about diseases, symptoms, and their relationships, we extract symptoms with existing nature language process tool Metamap which is designed for biomedical texts. To take advantages of the complex relationship between symptoms and diseases to enhance the accuracy of disease inference, we present two symptom representation models: term frequency-inverse document frequency (TF-IDF) model for the representation of the relationship between symptoms and diseases and Word2Vec for the expression of the semantic relationship between symptoms. Based on these two symptom representations, we employ the bidirectional Long Short Term Memory networks (BiLSTMs) to model symptom sequences in EMR. Our proposed model shows a significant improvement in term of AUC (0.895) and F1 (0.572) for 50 diseases in MIMIC-III dataset. The results illustrate that the model with the combination of the two symptom representations perform better than the one with only one of them.
- Subjects :
- Computer science
Inference
Medical classification
Disease
computer.software_genre
General Biochemistry, Genetics and Molecular Biology
03 medical and health sciences
Electronic Health Records
Humans
Word2vec
Molecular Biology
030304 developmental biology
Natural Language Processing
0303 health sciences
business.industry
Medical record
030302 biochemistry & molecular biology
Representation (systemics)
Expression (mathematics)
Term (time)
Semantics
Memory, Short-Term
Artificial intelligence
Neural Networks, Computer
business
computer
Natural language processing
Algorithms
Subjects
Details
- ISSN :
- 10959130
- Volume :
- 173
- Database :
- OpenAIRE
- Journal :
- Methods (San Diego, Calif.)
- Accession number :
- edsair.doi.dedup.....4eb23239a9667a5102c10254133156a7