Back to Search Start Over

HierCon: Hierarchical Organization of Technical Documents Based on Concepts

Authors :
Hanwen Zha
Yu Su
Shiyang Li
Keqian Li
Xifeng Yan
Semih Yavuz
Source :
ICDM
Publication Year :
2019
Publisher :
IEEE, 2019.

Abstract

In this work we study the hierarchical organization of technical documents, where given a set of documents and a hierarchy of categories, the goal is to assign documents to their corresponding categories. Unlike prior work on supervised hierarchical document categorization that relies on large amount of labeled training data, which is expensive to obtain in closed technical domain and tends to stale as new knowledge emerges, we study this problem in a weak supervision setting, by leveraging semantic information from concepts. The core idea is to project both documents and categories into a common concept embedding space, where their fine-grained similarity can be easily and effectively computed. Experiments over real-world datasets from the subject of computer science, physics & mathematics, and medicine demonstrated the superior performance of our approach over a wide range of state of the art baseline approaches.

Details

Database :
OpenAIRE
Journal :
2019 IEEE International Conference on Data Mining (ICDM)
Accession number :
edsair.doi...........7846e99df938951b72091fb5019626dd