The Song Describer Dataset: a Corpus of Audio Captions for Music-and-Language Evaluation

Authors :: Manco, Ilaria
Weck, Benno
Doh, SeungHeon
Won, Minz
Zhang, Yixiao
Bogdanov, Dmitry
Wu, Yusong
Chen, Ke
Tovstogan, Philip
Benetos, Emmanouil
Quinton, Elio
Fazekas, György
Nam, Juhan
Publication Year :: 2023
Abstract: We introduce the Song Describer dataset (SDD), a new crowdsourced corpus of high-quality audio-caption pairs, designed for the evaluation of music-and-language models. The dataset consists of 1.1k human-written natural language descriptions of 706 music recordings, all publicly accessible and released under Creative Common licenses. To showcase the use of our dataset, we benchmark popular models on three key music-and-language tasks (music captioning, text-to-music generation and music-language retrieval). Our experiments highlight the importance of cross-dataset evaluation and offer insights into how researchers can use SDD to gain a broader understanding of model performance.<br />Comment: Accepted to NeurIPS 2023 Workshop on Machine Learning for Audio

Subjects :: Computer Science - Sound
Computer Science - Artificial Intelligence
Computer Science - Computation and Language
Electrical Engineering and Systems Science - Audio and Speech Processing

Tools