Back to Search Start Over

Ideal ratio mask estimation based on cochleagram for audio-visual monaural speech enhancement.

Authors :
Balasubramanian, S.
Rajavel, R.
Kar, Asuthos
Source :
Applied Acoustics. Aug2023, Vol. 211, pN.PAG-N.PAG. 1p.
Publication Year :
2023

Abstract

In this paper, the estimation of the ideal ratio mask (IRM) has been carried out based on speech cochleagram and visual cues using Audio-Visual Multichannel Convolutional Neural Network (AVMCNN) to enhance the speech signal. Recently several researchers have shown that speech enhancement using visual data as an additional input along with the audio data is more effective in minimizing the acoustic noise present in the speech signal. This work proposes a novel CNN-based audio-visual IRM estimation model. In the proposed audio-visual IRM estimation model, the dynamics of both audio and visual signal features are extracted using multichannel CNN and contextually combined for speech enhancement. The enhanced speech obtained using the proposed model is evaluated based on speech quality and intelligibility. The evaluation results signify that the proposed audio-visual mask estimation model shows improved performance over the audio-only, visual-only, and existing audio-visual mask estimation models. In turn, the proposed AVMCNN model proves its effectiveness in combining the dynamics of the audio features with the visual speech features for speech enhancement. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
0003682X
Volume :
211
Database :
Academic Search Index
Journal :
Applied Acoustics
Publication Type :
Academic Journal
Accession number :
170745370
Full Text :
https://doi.org/10.1016/j.apacoust.2023.109524