1. Streaming Speaker Change Detection and Gender Classification for Transducer-Based Multi-Talker Speech Translation
- Author
-
Wang, Peidong, Kanda, Naoyuki, Xue, Jian, Li, Jinyu, Wang, Xiaofei, Subramanian, Aswin Shanmugam, Chen, Junkun, Sivasankaran, Sunit, Xiao, Xiong, and Zhao, Yong
- Subjects
Computer Science - Sound ,Computer Science - Artificial Intelligence ,Computer Science - Computation and Language ,Electrical Engineering and Systems Science - Audio and Speech Processing - Abstract
Streaming multi-talker speech translation is a task that involves not only generating accurate and fluent translations with low latency but also recognizing when a speaker change occurs and what the speaker's gender is. Speaker change information can be used to create audio prompts for a zero-shot text-to-speech system, and gender can help to select speaker profiles in a conventional text-to-speech model. We propose to tackle streaming speaker change detection and gender classification by incorporating speaker embeddings into a transducer-based streaming end-to-end speech translation model. Our experiments demonstrate that the proposed methods can achieve high accuracy for both speaker change detection and gender classification.
- Published
- 2025