1. An End-to-End Speech Separation Method Based on Features of Two Domains
- Author
-
Yu, Yongsheng, Qiu, Xiangyu, Hu, Fucai, He, Ruhan, and Zhang, Linke
- Abstract
Purpose: The current mainstream methods for single-channel speech separation generally use a feature extraction process like the short-time Fourier transform and rely on long input sequences. Thus, they do not fully utilize the information of speech features and cause signal delays in speech separation. Methods: To achieve better performance with a lightweight model, a fully convolution end-to-end audio separation network is proposed based on the features of two domains, i.e. temporal domain channel domain. It considers not only the temporal correlation of speech signals, but also the correlation between channels in the signal feature map. At first, the end-to-end network uses a convolution process with no overlapping segments to sample and encode the speech waveform. Subsequently, it calculates the mask by convolving the encoded feature space in both time series and inter-channel dimensions. Finally, it decodes the masked feature space to restructure the waveform. Results: The proposed end-to-end speech separation method makes full use of the feature space information of speech signals. Meanwhile, the separation module introduces residual structure and dilation convolution, which improves separation accuracy and computational speed with fewer parameters. The experiments show that compared with the base Conv-TasNet, the proposed model improves the SI-SNR (scale-invariant source-to-noise ratio) metric by 3.1 dB on the WSJ0-Mix2 dataset. Conclusion: This paper proposes an improved speech separation algorithm. Compared with Conv-TasNet, the performance of speech separation is improved. At the same time, the algorithm inherits the lightweight property of Conv-TasNet. In the task of separating speech signals mixed with a random signal-to-noise ratio (SNR) between −5 and 5 dB, the proposed algorithm achieves a relatively high accuracy.
- Published
- 2024
- Full Text
- View/download PDF