1. Estimation of Ideal Binary Mask for Audio-Visual Monaural Speech Enhancement.
- Author
-
Balasubramanian, S., Rajavel, R., and Kar, Asutosh
- Subjects
- *
INTELLIGIBILITY of speech , *SPEECH enhancement , *CONVOLUTIONAL neural networks , *SPEECH , *NOISE , *ACOUSTIC emission - Abstract
The estimation of the Ideal Binary Mask (IBM) based on speech cochleagram and visual cues were carried out in this paper to improve the speech intelligibility and quality using an Audio-Visual Convolutional Neural Network (AVCNN). Many speech enhancement techniques in the past depended heavily on audio attributes to reduce the noise present in the speech signal. Several studies have recently revealed that speech enhancement using visual data as an auxiliary input with audio data is more effective in reducing acoustic noise in speech signals. In the proposed work the multichannel CNN is used to extract the dynamics of both visual and audio signal features which were then integrated to estimate the threshold using the proposed algorithm to obtain the IBM for the enhancement of speech signal. The performance of the proposed model is evaluated primarily to measure the speech intelligibility in terms of STOI, ESTOI, and CSII additionally speech quality is also measured in terms PESQ, SSNR, CSIG, CBAK, and COVL. The evaluation results reveal that the proposed audio-visual mask estimation model outperforms the Audio-only, Visual-only, and existing audio-visual mask estimation models. The proposed AVCNN model, in turn, demonstrates its efficiency in merging the dynamics of audio information with visual speech information for speech enhancement. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF