Back to Search Start Over

Analyzing Multiple Medical Corpora Using Word Embedding

Authors :
Keyang Xu
V. G. Vinod Vydiswaran
Jian Huang
Source :
ICHI
Publication Year :
2016
Publisher :
IEEE, 2016.

Abstract

Neural language models, such as word embedding, can effectively embed words into vector spaces and preserve linguistic regularities and semantic relationships. However, few researchers have shown their effectiveness on medical terms and relationships. In this paper, we study the applicability of word2vec, a well-known technique for word embedding, to embed medical terms and relations based on different medical text corpora, including biomedical abstracts of scientific papers, health-related discussion forums, and a commonly available general-purpose information resource. We empirically evaluate the applicability of this approach by studying how the word embedding projects certain classes of medical terms and relations to the word space and analyzing the differences between the three corpora for embedding medical terms and relations. Results show that the corpus of health-related discussion forum posts, authored by lay persons and medical novices, trains a comparable word embedding for popular medical terms, when compared against a professionally authored corpus of published biomedical abstracts.

Details

Database :
OpenAIRE
Journal :
2016 IEEE International Conference on Healthcare Informatics (ICHI)
Accession number :
edsair.doi...........157a8b7e40b7475bfe976c669f87da36