Back to Search Start Over

PaperNet: A Dataset and Benchmark for Fine-Grained Paper Classification

Authors :
Tan Yue
Yong Li
Xuzhao Shi
Jiedong Qin
Zijiao Fan
Zonghai Hu
Source :
Applied Sciences, Vol 12, Iss 9, p 4554 (2022)
Publication Year :
2022
Publisher :
MDPI AG, 2022.

Abstract

Document classification is an important area in Natural Language Processing (NLP). Because a huge amount of scientific papers have been published at an accelerating rate, it is beneficial to carry out intelligent paper classifications, especially fine-grained classification for researchers. However, a public scientific paper dataset for fine-grained classification is still lacking, so the existing document classification methods have not been put to the test. To fill this vacancy, we designed and collected the PaperNet-Dataset that consists of multi-modal data (texts and figures). PaperNet 1.0 version contains hierarchical categories of papers in the fields of computer vision (CV) and NLP, 2 coarse-grained and 20 fine-grained (7 in CV and 13 in NLP). We ran current mainstream models on the PaperNet-Dataset, along with a multi-modal method that we propose. Interestingly, none of these methods reaches an accuracy of 80% in fine-grained classification, showing plenty of room for improvement. We hope that PaperNet-Dataset will inspire more work in this challenging area.

Details

Language :
English
ISSN :
20763417
Volume :
12
Issue :
9
Database :
Directory of Open Access Journals
Journal :
Applied Sciences
Publication Type :
Academic Journal
Accession number :
edsdoj.6001215992d4dbdadc2c7221208fcb5
Document Type :
article
Full Text :
https://doi.org/10.3390/app12094554