1. AI-based Chinese-style music generation from video content: a study on cross-modal analysis and generation methods
- Author
-
Moxi Cao, Jiaxiang Zheng, and Chongbin Zhang
- Subjects
Latent Diffusion Model ,Music generation ,AI composition ,Deep learning ,Traditional Chinese music ,Acoustics. Sound ,QC221-246 ,Electronic computers. Computer science ,QA75.5-76.95 - Abstract
Abstract In recent years, Artificial Intelligence Generated Content (AIGC) technologies have advanced rapidly, with models such as Stable Diffusion and GPT garnering significant attention across various domains. Against this backdrop, AI-driven music composition techniques have also produced significant progress. However, no existing model has yet demonstrated the capability to generate Chinese-style music corresponding to Chinese-style videos. To address this gap, this study proposes a novel Chinese-style video music generation model based on the Latent Diffusion Model (LDM) and Diffusion Transformers (DiT). Experimental results demonstrate that the proposed model generates Chinese-style music from Chinese-style videos and achieves performance comparable to the baseline models in audio quality, distribution fitting, musicality, rhythmic stability, and audio-visual synchronization. These findings indicate that the model captures the stylistic features of Chinese music. This research not only demonstrates the feasibility applications of artificial intelligence in music creation but also provides a new technological approach to preserve and innovate the traditional Chinese music culture in the digital era. Furthermore, it explores new possibilities for the dissemination and innovation of Chinese cultural arts in the digital age.
- Published
- 2025
- Full Text
- View/download PDF