5 results on '"Zhiying Huang"'
Search Results
2. Rapid speaker adaptation based on D-code extracted from BLSTM-RNN in LVCSR
- Author
-
Li-Rong Dai, Zhiying Huang, Shaofei Xue, and Zhi-Jie Yan
- Subjects
Computer science ,business.industry ,Speech recognition ,Context (language use) ,Pattern recognition ,010501 environmental sciences ,Speaker recognition ,01 natural sciences ,Reduction (complexity) ,030507 speech-language pathology & audiology ,03 medical and health sciences ,Discriminative model ,Code (cryptography) ,Artificial intelligence ,0305 other medical science ,Cluster analysis ,business ,Decoding methods ,0105 earth and related environmental sciences ,Interpolation - Abstract
Recently, several fast speaker adaptation methods have been proposed for the hybrid DNN-HMM models based on the so-called discriminative speaker codes (SC) [1-3] and applied to unsupervised speaker adaptation in speech recognition [4]. It has been demonstrated that the SC based methods are quite effective in adapting DNNs even when only a very small amount of adaptation data is available. However, in this way we have to estimate speaker code for new speakers by an updating process and obtain the final results through two-pass decoding. In this paper, we propose an alternative d-code extraction method to replace SC based on modeling speaker information with BLSTM-RNN which makes one-pass decoding possible. After that, a speaker clustering approach is introduced to decrease the target number of speaker-BLSTM which accelerates training speed and improves ASR performance at the same time. Meanwhile, an interpolation method is provided for taking use of d-codes from training set to improve the recognition accuracy especially when adaptation data is limited. Experimental results on Switchboard task have shown that the proposed methods lead to a comparable relative reduction in WER (about 9%) as the standard SC based adaptation method without the need of two-pass decoding.
- Published
- 2016
- Full Text
- View/download PDF
3. Unsupervised speaker adaptation of BLSTM-RNN for LVCSR based on speaker code
- Author
-
Li-Rong Dai, Shaofei Xue, Zhi-Jie Yan, and Zhiying Huang
- Subjects
Normalization (statistics) ,Computer science ,business.industry ,Speech recognition ,Word error rate ,Pattern recognition ,TIMIT ,Speaker recognition ,01 natural sciences ,Speaker diarisation ,030507 speech-language pathology & audiology ,03 medical and health sciences ,Recurrent neural network ,0103 physical sciences ,Singular value decomposition ,Artificial intelligence ,0305 other medical science ,Hidden Markov model ,business ,010301 acoustics - Abstract
Recently, the speaker code based adaptation has been successfully expanded to recurrent neural networks using bidirectional Long Short-Term Memory (BLSTM-RNN) [1]. Experiments on the small-scale TIMIT task have demonstrated that the speaker code based adaptation is also valid for BLSTM-RNN. In this paper, we evaluate this method on large-scale task and introduce an error normalization method to balance the back-propagation errors derived from different layers for speaker codes. Meanwhile, we use singular value decomposition (SVD) method to conduct model compression. Results show that the speaker code based adaptation with SVD shows better recognition performance than the i-vector based speaker adaptation of the same dimension. Experimental results on Switchboard task show that the speaker code based adaptation on the hybrid BLSTM-DNN topology can achieve more than 9% relative reduction in word error rate (WER) compared to the speaker independent (SI) baseline.
- Published
- 2016
- Full Text
- View/download PDF
4. Speaker adaptation OF RNN-BLSTM for speech recognition based on speaker code
- Author
-
Zhiying Huang, Li-Rong Dai, Shaofei Xue, and Jian Tang
- Subjects
Artificial neural network ,Computer science ,business.industry ,Speech recognition ,Word error rate ,Acoustic model ,020207 software engineering ,TIMIT ,Pattern recognition ,02 engineering and technology ,Speaker recognition ,Speaker diarisation ,030507 speech-language pathology & audiology ,03 medical and health sciences ,Recurrent neural network ,Phone ,0202 electrical engineering, electronic engineering, information engineering ,Artificial intelligence ,0305 other medical science ,business ,Adaptation (computer science) - Abstract
Recently, recurrent neural network with bidirectional Long Short-Term Memory (RNN-BLSTM) acoustic model has been shown to give great performance on the TIMIT [1] and other speech recognition tasks. Meanwhile, the speaker code based adaptation method has been demonstrated as a valid adaptation method for Deep Neural Network (DNN) acoustic model [2]. However, whether the speaker code based adaptation method is also valid for RNN-BLSTM has not been reported to the best our knowledge. In this paper, we study how to conduct effective speaker code based speaker adaptation on RNN-BLSTM and demonstrate that the speaker code based adaptation method is also a valid adaptation method for RNN-BLSTM. Experimental results on TIMIT have shown that the adaptation of RNN-LSTM can achieve over 10% relative reduction in phone error rate (PER) compared to without adaptation. Then, a set of comparative experiments are implemented to analyze the different contribution of the adaptation on cell input and each gate activation function of the BLSTM. It's found that the adaptation on cell input activation function is more effective than the adaptation on each gate activation function.
- Published
- 2016
- Full Text
- View/download PDF
5. Multi-array Data Fusion Based Direct Position Determination Algorithm
- Author
-
Jiang Wu and Zhiying Huang
- Subjects
Noise ,Position (vector) ,Computer science ,Particle swarm optimization ,Direction of arrival ,Antenna (radio) ,Sensor fusion ,Algorithm ,Eigendecomposition of a matrix ,Subspace topology - Abstract
Traditional multi-station localization system needs to estimate the direction of arrival (DOA) to determine the location of the target. In low SNR situation the localization fails because of the large estimation errors. To improve the performance of the localization system in low SNR situation, we investigate the direct position determination problem from passive measurements made with several antenna arrays of base station in the case of an emitting source. In this paper, a multi-array data fusion based direct position determination algorithm (MDF-DPD) is proposed. The output data of each array are first processed with self-correlation and eigenvalue decomposition to get the noise subspace data. The position of the target is then determined by searching the best array response in the monitoring area which has the biggest sum of projections to respective noise subspace using quantum-behaved particle swarm optimization (QPSO) algorithm. Simulation results demonstrated that it performed better in low SNR situation than classical approach.
- Published
- 2014
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.