Back to Search
Start Over
Restricted Boltzmann Machine Vectors for Speaker Clustering and Tracking Tasks in TV Broadcast Shows
- Source :
- Applied Sciences, Vol 9, Iss 13, p 2761 (2019), UPCommons. Portal del coneixement obert de la UPC, Universitat Politècnica de Catalunya (UPC), Applied Sciences, Volume 9, Issue 13
- Publication Year :
- 2019
- Publisher :
- MDPI AG, 2019.
-
Abstract
- Restricted Boltzmann Machines (RBMs) have shown success in both the front-end and backend of speaker verification systems. In this paper, we propose applying RBMs to the front-end for the tasks of speaker clustering and speaker tracking in TV broadcast shows. RBMs are trained to transform utterances into a vector based representation. Because of the lack of data for a test speaker, we propose RBM adaptation to a global model. First, the global model&mdash<br />which is referred to as universal RBM&mdash<br />is trained with all the available background data. Then an adapted RBM model is trained with the data of each test speaker. The visible to hidden weight matrices of the adapted models are concatenated along with the bias vectors and are whitened to generate the vector representation of speakers. These vectors, referred to as RBM vectors, were shown to preserve speaker-specific information and are used in the tasks of speaker clustering and speaker tracking. The evaluation was performed on the audio recordings of Catalan TV Broadcast shows. The experimental results show that our proposed speaker clustering system gained up to 12% relative improvement, in terms of Equal Impurity (EI), over the baseline system. On the other hand, in the task of speaker tracking, our system has a relative improvement of 11% and 7% compared to the baseline system using cosine and Probabilistic Linear Discriminant Analysis (PLDA) scoring, respectively.
- Subjects :
- speaker segmentation
Computer science
Speech recognition
Restricted Boltzmann machine adaptation
restricted boltzmann machine adaptation
Boltzmann machine
02 engineering and technology
Tracking (particle physics)
lcsh:Technology
lcsh:Chemistry
Broadcast television systems
Machine learning
0202 electrical engineering, electronic engineering, information engineering
Trigonometric functions
General Materials Science
Speech processing systems
Aprenentatge automàtic -- Algorismes
Cluster analysis
Representation (mathematics)
Speaker tracking
lcsh:QH301-705.5
Instrumentation
speaker clustering
Fluid Flow and Transfer Processes
Restricted Boltzmann machine
Speaker clustering
lcsh:T
Process Chemistry and Technology
Agglomerative hierarchical clustering
General Engineering
020206 networking & telecommunications
Enginyeria de la telecomunicació [Àrees temàtiques de la UPC]
lcsh:QC1-999
Computer Science Applications
Altaveus
Task (computing)
lcsh:Biology (General)
lcsh:QD1-999
lcsh:TA1-2040
Processament de la parla
agglomerative hierarchical clustering
speaker tracking
020201 artificial intelligence & image processing
lcsh:Engineering (General). Civil engineering (General)
lcsh:Physics
Speaker segmentation
Subjects
Details
- ISSN :
- 20763417
- Volume :
- 9
- Database :
- OpenAIRE
- Journal :
- Applied Sciences
- Accession number :
- edsair.doi.dedup.....d0677cf54d6bcb5c449493ba61bfd572
- Full Text :
- https://doi.org/10.3390/app9132761