Self-labeling with feature transfer for speech emotion recognition.

Authors :: Wen, Guihua
Liao, Huiqiang
Li, Huihui
Wen, Pengchen
Zhang, Tong
Gao, Sande
Wang, Bao
Source :: Knowledge-Based Systems. Oct2022, Vol. 254, pN.PAG-N.PAG. 1p.
Publication Year :: 2022
Abstract: Most speech emotion recognition methods based on frames have obtained good results in many applications. However, they segment each speech sample into smaller frames that are labeled with the same emotional tag as that of the speech sample. This is inconsistent with the possibility of a speech sample containing several emotional categories at the same time. Thus, this paper proposes a self-labeling (SL) learning method for speech emotion recognition, which automatically segments each speech sample into frames and then labels them with the corresponding emotional tags, where the compatibility of these tags is also checked. Then, a time–frequency deep neural network for speech emotion recognition is designed and trained. As most speech emotion datasets are very small, the feature transfer model is applied to further enhance the performance of the SL learning method, which is trained on large-scale audio data. Experimental results on various datasets demonstrate the effectiveness of the proposed method. [ABSTRACT FROM AUTHOR]