1. MuIm: Analyzing Music–Image Correlations from an Artistic Perspective
- Author
-
Ubaid Ullah and Hyun-Chul Choi
- Subjects
music–image ,cross-modality ,neural networks ,multi-modality ,Technology ,Engineering (General). Civil engineering (General) ,TA1-2040 ,Biology (General) ,QH301-705.5 ,Physics ,QC1-999 ,Chemistry ,QD1-999 - Abstract
Cross-modality understanding is essential for AI to tackle complex tasks that require both deterministic and generative capabilities, such as correlating music and visual art. The existing state-of-the-art methods of audio-visual correlation often rely on single-dimension information, focusing either on semantic or emotional attributes, thus failing to capture the full depth of these inherently complex modalities. Addressing this limitation, we introduce a novel approach that perceives music–image correlation as multilayered rather than as a direct one-to-one correspondence. To this end, we present a pioneering dataset with two segments: an artistic segment that pairs music with art based on both emotional and semantic attributes, and a realistic segment that links music with images through affective–semantic layers. In modeling emotional layers for the artistic segment, we found traditional 2D affective models inadequate, prompting us to propose a more interpretable hybrid-emotional rating system that serves both experts and non-experts. For the realistic segment, we utilize a web-based dataset with tags, dividing tag information into semantic and affective components to ensure a balanced and nuanced representation of music–image correlation. We conducted an in-depth statistical analysis and user study to evaluate our dataset’s effectiveness and applicability for AI-driven understanding. This work provides a foundation for advanced explorations into the complex relationships between auditory and visual art modalities, advancing the development of more sophisticated cross-modal AI systems.
- Published
- 2024
- Full Text
- View/download PDF