鲁棒的特定人语音分离算法.

Authors :: 张　新
 付中华
Source :: Application Research of Computers / Jisuanji Yingyong Yanjiu. Jun2022, Vol. 39 Issue 6, p1749-1759. 11p.
Publication Year :: 2022
Abstract: The aim of target speaker ' s speech separation is to extract one's speech from a mixture speech consisted of multiple speakers, which is guided by an eigenvector. There are two ways to get the eigenvector, one is to use a one-hot vector, another is to adaptively generate an embedding containing the target speaker ' s characteristic form a classification neural network. The advantage of using one-hot vector is that it can achieve perfect performance during the training process, while it cannot handle the unseen speakers beyond the training set. The advantage of using embedded vectors is that it loses part of the training effect,but it has a good generalization effect on unseen speakers. In order to solve the shortcomings of singlehandedly using onehot vector or embedding vector in specific speaker speech separation algorithm, this paper proposed a hybrid training method. It used the one-hot vector and embedding vector alternately as the identity feature vector of target speakers. By mapping one-hot and embedding into public space, the proposed method could achieve a good generation effect while ensuring the training effect. The experimental results show that the proposed method achieves more than 10 dB SDR improvement on unseen speakers' speech separation. [ABSTRACT FROM AUTHOR]