Back to Search Start Over

Long-Term Multi-band Frequency-Domain Mean-Crossing Rate (FDMCR): A Novel Feature Extraction Algorithm for Speech/Music Discrimination.

Authors :
Kahrizi, Mohammad Rasoul
Kabudian, Seyed Jahanshah
Source :
Circuits, Systems & Signal Processing. Nov2023, Vol. 42 Issue 11, p6929-6950. 22p.
Publication Year :
2023

Abstract

Multimedia data have increased dramatically today, making the distinction between desirable information and other types of information extremely important. Speech/music discrimination is a field of audio analytics that aims to detect and classify speech and music segments in an audio file. This paper proposes a novel feature extraction method called Long-Term Multi-band Frequency-Domain Mean-Crossing Rate (FDMCR). The proposed feature computes the average frequency-domain mean-crossing rate along the frequency axis for each of the perceptual Mel-scaled frequency bands of the signal power spectrum. In this paper, the class-separation capability of this feature is first measured by well-known divergence criteria such as Maximum Fisher Discriminant Ratio (MFDR), Bhattacharyya divergence, and Jeffreys/Symmetric Kullback–Leibler (SKL) divergence. The proposed feature is then applied to the speech/music discrimination (SMD) process on two well-known speech-music datasets—GTZAN and S &S (Scheirer and Slaney). The results obtained on the two datasets using conventional classifiers, including k-NN, GMM, and SVM, as well as deep learning-based classification methods, including CNN, LSTM, and BiLSTM, show that the proposed feature outperforms other features in speech/music discrimination. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
0278081X
Volume :
42
Issue :
11
Database :
Academic Search Index
Journal :
Circuits, Systems & Signal Processing
Publication Type :
Academic Journal
Accession number :
172805522
Full Text :
https://doi.org/10.1007/s00034-023-02440-0