1. Building a Multimodal Dataset of Academic Paper for Keyword Extraction.
- Author
-
Zhang, Jingyu, Yan, Xinyi, Xiang, Yi, Zhang, Yingyi, and Zhang, Chengzhi
- Subjects
- *
DATA mining , *TEXT mining , *KEYWORDS , *PREDICTION models , *DATA fusion (Statistics) - Abstract
Up to this point, keyword extraction task typically relies solely on textual data. Neglecting visual details and audio features from image and audio modalities leads to deficiencies in information richness and overlooks potential correlations, thereby constraining the model's ability to learn representations of the data and the accuracy of model predictions. Furthermore, the currently available multimodal datasets for keyword extraction task are particularly scarce, further hindering the progress of research on multimodal keyword extraction task. Therefore, this study constructs a multimodal dataset of academic paper consisting of 1,000 samples, with each sample containing paper text, images, audios and keywords. Based on unsupervised and supervised methods of keyword extraction, experiments are conducted using textual data from papers, as well as text extracted from images and audio. The aim is to investigate the differences in performance in keyword extraction task with respect to different modal information and the fusion of multimodal information. The experimental results indicate that text from different modalities exhibits distinct characteristics in the model. The concatenation of paper text, image text and audio text can effectively enhance the keyword extraction performance of academic papers. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF