Back to Search
Start Over
Integration of Speech Separation, Diarization, and Recognition for Multi-Speaker Meetings: System Description, Comparison, and Analysis
- Source :
- SLT
- Publication Year :
- 2021
- Publisher :
- IEEE, 2021.
-
Abstract
- Multi-speaker speech recognition of unsegmented recordings has diverse applications such as meeting transcription and automatic subtitle generation. With technical advances in systems dealing with speech separation, speaker diarization, and automatic speech recognition (ASR) in the last decade, it has become possible to build pipelines that achieve reasonable error rates on this task. In this paper, we propose an end-to-end modular system for the LibriCSS meeting data, which combines independently trained separation, diarization, and recognition components, in that order. We study the effect of different state-of-the-art methods at each stage of the pipeline, and report results using task-specific metrics like SDR and DER, as well as downstream WER. Experiments indicate that the problem of overlapping speech for diarization and ASR can be effectively mitigated with the presence of a well-trained separation module. Our best system achieves a speaker-attributed WER of 12.7%, which is close to that of a non-overlapping ASR.<br />Accepted to IEEE SLT 2021
- Subjects :
- FOS: Computer and information sciences
Sound (cs.SD)
Computer science
Speech recognition
Modular system
020206 networking & telecommunications
02 engineering and technology
Computer Science - Sound
Speaker diarisation
030507 speech-language pathology & audiology
03 medical and health sciences
Audio and Speech Processing (eess.AS)
Error analysis
FOS: Electrical engineering, electronic engineering, information engineering
0202 electrical engineering, electronic engineering, information engineering
Task analysis
Subtitle
Transcription (software)
0305 other medical science
Electrical Engineering and Systems Science - Audio and Speech Processing
Subjects
Details
- Database :
- OpenAIRE
- Journal :
- 2021 IEEE Spoken Language Technology Workshop (SLT)
- Accession number :
- edsair.doi.dedup.....0698dfaa446e0f8d950c676d7accdee8
- Full Text :
- https://doi.org/10.1109/slt48900.2021.9383556