Music detection from broadcast contents using convolutional neural networks with a Mel-scale kernel

Authors :: Byeong-Yong Jang
Woon-Haeng Heo
Jung-Hyun Kim
Oh-Wook Kwon
Source :: EURASIP Journal on Audio, Speech, and Music Processing, Vol 2019, Iss 1, Pp 1-12 (2019)
Publication Year :: 2019
Publisher :: SpringerOpen, 2019.
Abstract: Abstract We propose a new method for music detection from broadcasting contents using the convolutional neural networks with a Mel-scale kernel. In this detection task, music segments should be annotated from the broadcast data, where music, speech, and noise are mixed. The convolutional neural network is composed of a convolutional layer with kernel that is trained to extract robust features. The Mel-scale changes the kernel size, and the backpropagation algorithm trains the kernel shape. We used 52 h of mixed broadcast data (25 h of music) to train the convolutional network and 24 h of collected broadcast data (ratio of music of 50–76%) for testing. The test data consisted of various genres (drama, documentary, news, kids, reality, and so on) that are broadcast in British English, Spanish, and Korean languages. The proposed method consistently showed better performance in all the three languages than the baseline system, and the F-score ranged from 86.5% for British data to 95.9% for Korean drama data. Our music detection system takes about 28 s to process a 1-min signal using only one CPU with 4 cores.

Subjects :: Music detection
Music segmentation
Convolutional neural networks
Mel-scale filter bank
Acoustics. Sound
QC221-246
Electronic computers. Computer science
QA75.5-76.95

Full Text Access

Tools