Back to Search Start Over

The First Multimodal Information Based Speech Processing (Misp) Challenge: Data, Tasks, Baselines And Results

Authors :
Hang Chen
Hengshun Zhou
Jun Du
Chin-Hui Lee
Jingdong Chen
Shinji Watanabe
Sabato Marco Siniscalchi
Odette Scharenborg
Di-Yuan Liu
Bao-Cai Yin
Jia Pan
Jian-Qing Gao
Cong Liu
Source :
Proceedings of the ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Publication Year :
2022
Publisher :
IEEE, 2022.

Abstract

In this paper we discuss the rational of the Multi-model Information based Speech Processing (MISP) Challenge, and provide a detailed description of the data recorded, the two evaluation tasks and the corresponding baselines, followed by a summary of submitted systems and evaluation results. The MISP Challenge aims at tack-ling speech processing tasks in different scenarios by introducing information about an additional modality (e.g., video, or text), which will hopefully lead to better environmental and speaker robustness in realistic applications. In the first MISP challenge, two bench-mark datasets recorded in a real-home TV room with two reproducible open-source baseline systems have been released to promote research in audio-visual wake word spotting (AVWWS) and audio-visual speech recognition (AVSR). To our knowledge, MISP is the first open evaluation challenge to tackle real-world issues of AVWWS and AVSR in the home TV scenario.

Details

Database :
OpenAIRE
Journal :
ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Accession number :
edsair.doi.dedup.....0b6f807977016422b1ea2e0ee65408c6
Full Text :
https://doi.org/10.1109/icassp43922.2022.9746683