Start Over

Instruct-MusicGen: Unlocking Text-to-Music Editing for Music Language Models via Instruction Tuning

Authors :: Zhang, Yixiao
Ikemiya, Yukara
Choi, Woosung
Murata, Naoki
Martínez-Ramírez, Marco A.
Lin, Liwei
Xia, Gus
Liao, Wei-Hsiang
Mitsufuji, Yuki
Dixon, Simon
Publication Year :: 2024
Abstract: Recent advances in text-to-music editing, which employ text queries to modify music (e.g.\ by changing its style or adjusting instrumental components), present unique challenges and opportunities for AI-assisted music creation. Previous approaches in this domain have been constrained by the necessity to train specific editing models from scratch, which is both resource-intensive and inefficient; other research uses large language models to predict edited music, resulting in imprecise audio reconstruction. To Combine the strengths and address these limitations, we introduce Instruct-MusicGen, a novel approach that finetunes a pretrained MusicGen model to efficiently follow editing instructions such as adding, removing, or separating stems. Our approach involves a modification of the original MusicGen architecture by incorporating a text fusion module and an audio fusion module, which allow the model to process instruction texts and audio inputs concurrently and yield the desired edited music. Remarkably, Instruct-MusicGen only introduces 8% new parameters to the original MusicGen model and only trains for 5K steps, yet it achieves superior performance across all tasks compared to existing baselines, and demonstrates performance comparable to the models trained for specific tasks. This advancement not only enhances the efficiency of text-to-music editing but also broadens the applicability of music language models in dynamic music production environments.<br />Comment: Code and demo are available at: https://github.com/ldzhangyx/instruct-musicgen

Subjects :: Computer Science - Sound
Computer Science - Artificial Intelligence
Computer Science - Machine Learning
Computer Science - Multimedia
Electrical Engineering and Systems Science - Audio and Speech Processing

Details

Database :: arXiv
Publication Type :: Report
Accession number :: edsarx.2405.18386
Document Type :: Working Paper

Tools

Email
Cite

Printer

Authors Abstract Subjects Details

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Instruct-MusicGen: Unlocking Text-to-Music Editing for Music Language Models via Instruction Tuning

Abstract

Subjects

Details

Tools

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Instruct-MusicGen: Unlocking Text-to-Music Editing for Music Language Models via Instruction Tuning

Abstract

Subjects

Details

Tools

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources