Start Over

Self-Supervised Hypergraph Learning for Enhanced Multimodal Representation

Authors :: Hongji Shu
Chaojun Meng
Pasquale De Meo
Qing Wang
Jia Zhu
Source :: IEEE Access, Vol 12, Pp 20830-20839 (2024)
Publication Year :: 2024
Publisher :: IEEE, 2024.
Abstract: Hypergraph neural networks have gained substantial popularity in capturing complex correlations between data items in multimodal datasets. In this study, we propose a novel approach called the self-supervised hypergraph learning (SHL) framework that focuses on extracting hypergraph features to improve multimodal representation. Our method utilizes a dual embedding strategy and leverages SHL to improve the accuracy and robustness of the model. To achieve this, we employ a hypergraph learning framework to extract global context effectively by capturing rich inter-modal dependencies. Additionally, we introduce a novel self-supervised learning (SSL) component that utilizes the interaction graph data, thereby strengthening the robustness of the model. By jointly optimizing hypergraph feature extraction and SSL, SHL significantly improves the performance of multimodal representation tasks. To validate the effectiveness of our approach, we construct two comprehensive multimodal micro-video recommendation datasets using publicly available data (TikTok and MovieLens-10M). Prior to dataset creation, we meticulously handle invalid entries and outliers and complete missing mode information using external auxiliary sources, such as YouTube. These datasets are made publicly available to the research community for evaluation purposes. Experimental results on the above recommendation datasets demonstrate that the proposed SHL approach outperforms state-of-the-art baselines, highlighting its superior performance in multimodal representation tasks.