1. COS: A new MeSH term embedding incorporating corpus, ontology, and semantic predications
- Author
-
Wei Jin and Juncheng Ding
- Subjects
Computer science ,Information Theory ,Information Storage and Retrieval ,Social Sciences ,02 engineering and technology ,Ontology (information science) ,Cardiovascular Medicine ,computer.software_genre ,Translocation, Genetic ,Medical Subject Headings ,Database and Informatics Methods ,Mathematical and Statistical Techniques ,Medical Conditions ,0202 electrical engineering, electronic engineering, information engineering ,Medicine and Health Sciences ,Data Management ,Complex data type ,0303 health sciences ,Multidisciplinary ,Mathematical Models ,Statistics ,Semantics ,Cardiovascular Diseases ,Physical Sciences ,Embedding ,Medicine ,Natural language processing ,Research Article ,Computer and Information Sciences ,Process (engineering) ,Abstracting and Indexing ,Bioinformatics ,Science ,Cardiology ,Research and Analysis Methods ,Set (abstract data type) ,03 medical and health sciences ,020204 information systems ,Ontologies ,Learning ,Statistical Methods ,030304 developmental biology ,Mesh term ,business.industry ,Computational Biology ,Linguistics ,Lexical Semantics ,Biological Ontologies ,Random Walk ,Graph Theory ,Artificial intelligence ,business ,computer ,Mathematics ,Forecasting - Abstract
The embedding of Medical Subject Headings (MeSH) terms has become a foundation for many downstream bioinformatics tasks. Recent studies employ different data sources, such as the corpus (in which each document is indexed by a set of MeSH terms), the MeSH term ontology, and the semantic predications between MeSH terms (extracted by SemMedDB), to learn their embeddings. While these data sources contribute to learning the MeSH term embeddings, current approaches fail to incorporate all of them in the learning process. The challenge is that the structured relationships between MeSH terms are different across the data sources, and there is no approach to fusing such complex data into the MeSH term embedding learning. In this paper, we study the problem of incorporating corpus, ontology, and semantic predications to learn the embeddings of MeSH terms. We propose a novel framework, Corpus, Ontology, and Semantic predications-based MeSH term embedding (COS), to generate high-quality MeSH term embeddings. COS converts the corpus, ontology, and semantic predications into MeSH term sequences, merges these sequences, and learns MeSH term embeddings using the sequences. Extensive experiments on different datasets show that COS outperforms various baseline embeddings and traditional non-embedding-based baselines.
- Published
- 2021