Back to Search
Start Over
Analyzing Multiple Medical Corpora Using Word Embedding
- Source :
- ICHI
- Publication Year :
- 2016
- Publisher :
- IEEE, 2016.
-
Abstract
- Neural language models, such as word embedding, can effectively embed words into vector spaces and preserve linguistic regularities and semantic relationships. However, few researchers have shown their effectiveness on medical terms and relationships. In this paper, we study the applicability of word2vec, a well-known technique for word embedding, to embed medical terms and relations based on different medical text corpora, including biomedical abstracts of scientific papers, health-related discussion forums, and a commonly available general-purpose information resource. We empirically evaluate the applicability of this approach by studying how the word embedding projects certain classes of medical terms and relations to the word space and analyzing the differences between the three corpora for embedding medical terms and relations. Results show that the corpus of health-related discussion forum posts, authored by lay persons and medical novices, trains a comparable word embedding for popular medical terms, when compared against a professionally authored corpus of published biomedical abstracts.
- Subjects :
- Text corpus
Word embedding
Information retrieval
020205 medical informatics
Computer science
business.industry
02 engineering and technology
010501 environmental sciences
Semantics
computer.software_genre
01 natural sciences
0202 electrical engineering, electronic engineering, information engineering
Encyclopedia
Embedding
Word2vec
Artificial intelligence
Language model
business
computer
Natural language processing
Word (computer architecture)
0105 earth and related environmental sciences
Subjects
Details
- Database :
- OpenAIRE
- Journal :
- 2016 IEEE International Conference on Healthcare Informatics (ICHI)
- Accession number :
- edsair.doi...........157a8b7e40b7475bfe976c669f87da36