1. Speaker Diarization through Waveform and Neural Net
- Author
-
Rustam Latypov and Evgeni Stolov
- Subjects
speaker diarization ,neural network ,waveform ,Telecommunication ,TK5101-6720 - Abstract
This paper presents an approach to the speaker diarization problem based on speech local waveform analysis. We assume that the recorded sound scene consists of a known number of sources and that the single microphone is utilized for recording. The research goal is to develop an algorithm for speaker diarization in online mode. The most significant attention is paid to limiting computer resources when solving the problem. We suppose that the speech file is already segmented so that any segment belongs to a single speaker. Our method is as follows. We divide each segment into non-overlapping fragments of the constant length and change any sample in the fragment to its absolute value. A particular technique is used to choose a threshold value Thr. After that, we select the portions of the fragments that exceed Thr and implement coding to describe the source signals revealed parts as normalized cumulative sums containing the same number of items. These sums are used as input vectors for two types of neural networks. For comparison, we also developed a simple algorithm that does not leverage the neural net but fits the problem. Experiment shows, end-to-end neural classification of the fragments brings acceptable results.
- Published
- 2021
- Full Text
- View/download PDF