Back to Search
Start Over
Comparative effectiveness of medical concept embedding for feature engineering in phenotyping
- Source :
- JAMIA Open
- Publication Year :
- 2020
-
Abstract
- Objective Feature engineering is a major bottleneck in phenotyping. Properly learned medical concept embeddings (MCEs) capture the semantics of medical concepts, thus are useful for retrieving relevant medical features in phenotyping tasks. We compared the effectiveness of MCEs learned from knowledge graphs and electronic healthcare records (EHR) data in retrieving relevant medical features for phenotyping tasks. Materials and Methods We implemented 5 embedding methods including node2vec, singular value decomposition (SVD), LINE, skip-gram, and GloVe with 2 data sources: (1) knowledge graphs obtained from the observational medical outcomes partnership (OMOP) common data model; and (2) patient-level data obtained from the OMOP compatible electronic health records (EHR) from Columbia University Irving Medical Center (CUIMC). We used phenotypes with their relevant concepts developed and validated by the electronic medical records and genomics (eMERGE) network to evaluate the performance of learned MCEs in retrieving phenotype-relevant concepts. Hits@k% in retrieving phenotype-relevant concepts based on a single and multiple seed concept(s) was used to evaluate MCEs. Results Among all MCEs, MCEs learned by using node2vec with knowledge graphs showed the best performance. Of MCEs based on knowledge graphs and EHR data, MCEs learned by using node2vec with knowledge graphs and MCEs learned by using GloVe with EHR data outperforms other MCEs, respectively. Conclusion MCE enables scalable feature engineering tasks, thereby facilitating phenotyping. Based on current phenotyping practices, MCEs learned by using knowledge graphs constructed by hierarchical relationships among medical concepts outperformed MCEs learned by using EHR data.
- Subjects :
- Feature engineering
0303 health sciences
Information retrieval
phenotyping
AcademicSubjects/SCI01060
Computer science
Columbia university
Health Informatics
Semantics
Research and Applications
Bottleneck
03 medical and health sciences
embedding
representation learning
0302 clinical medicine
electronic health records
Knowledge graph
knowledge graph
Scalability
Embedding
030212 general & internal medicine
AcademicSubjects/SCI01530
AcademicSubjects/MED00010
Feature learning
030304 developmental biology
Subjects
Details
- ISSN :
- 25742531
- Volume :
- 4
- Issue :
- 2
- Database :
- OpenAIRE
- Journal :
- JAMIA open
- Accession number :
- edsair.doi.dedup.....523d010de67057cf74d5c1d922041e2a