1. A Comprehensive Evaluation of Metadata-Based Features to Classify Research Paper’s Topics
- Author
-
Ghulam Mustafa, Muhammad Usman, Muhammad Tanvir Afzal, Abdul Shahid, and Anis Koubaa
- Subjects
Word2Vector (W2V) ,decision tree (DT) ,association of computing machinery (ACM) ,metadata ,Electrical engineering. Electronics. Nuclear engineering ,k-nearest neighbor’s (KNN) ,Research paper classification ,TK1-9971 - Abstract
The existing plethora of document classification techniques exploits different data sources either from the content or metadata of research articles. Various journal publishers like Springer, Elsevier, IEEE, etc., do not provide open access to the content of research articles, whereas metadata is freely available there. Metadata like title, keyword, and abstract can serve as a better alternative to the content in various scenarios. In the current literature, researchers have assessed the role of some of the metadata individually. We believe that the collective contribution of metadata parameters can play a significant role in classifying research papers. This paper presents a comprehensive evaluation of the role of metadata, individually as well as in combinations to achieve the objective of research paper classification. Moreover, we have classified the research articles into ACM hierarchy root categories (e.g. general literature, hardware, software, etc.). In this comprehensive evaluation, we have assessed all the possible combinations of metadata features against different classifiers such as Random Forest, K Nearest Neighbor, and Decision Tree. The results of this research reveal that the title & keywords combination outperforms other combinations with an F-measure score of 0.88.
- Published
- 2021