Back to Search
Start Over
HierCon: Hierarchical Organization of Technical Documents Based on Concepts
- Source :
- ICDM
- Publication Year :
- 2019
- Publisher :
- IEEE, 2019.
-
Abstract
- In this work we study the hierarchical organization of technical documents, where given a set of documents and a hierarchy of categories, the goal is to assign documents to their corresponding categories. Unlike prior work on supervised hierarchical document categorization that relies on large amount of labeled training data, which is expensive to obtain in closed technical domain and tends to stale as new knowledge emerges, we study this problem in a weak supervision setting, by leveraging semantic information from concepts. The core idea is to project both documents and categories into a common concept embedding space, where their fine-grained similarity can be easily and effectively computed. Experiments over real-world datasets from the subject of computer science, physics & mathematics, and medicine demonstrated the superior performance of our approach over a wide range of state of the art baseline approaches.
- Subjects :
- Hierarchy
Information retrieval
Hierarchy (mathematics)
Computer science
Subject (documents)
Concept mining
02 engineering and technology
Technical documentation
Domain (software engineering)
Set (abstract data type)
Categorization
020204 information systems
0202 electrical engineering, electronic engineering, information engineering
Embedding
Hierarchical organization
020201 artificial intelligence & image processing
Subjects
Details
- Database :
- OpenAIRE
- Journal :
- 2019 IEEE International Conference on Data Mining (ICDM)
- Accession number :
- edsair.doi...........7846e99df938951b72091fb5019626dd