Author: "Jiaen Liang" / Topic: speech recognition - Searchworks@Jio Institute Digital Library Search Results

1. Mask-based blind source separation and MVDR beamforming in ASR

Author: Jiaen Liang, Yanhua Long, Yijie Li, and Renke He
Subjects: Beamforming, Linguistics and Language, Computer science, Speech recognition, Cocktail party effect, Blind signal separation, Language and Linguistics, Human-Computer Interaction, Speech enhancement, Reduction (complexity), Background noise, Minimum-variance unbiased estimator, Source separation, Computer Vision and Pattern Recognition, Software
Abstract: This paper presents a front-end enhancement system for automatic speech recognition to address the cocktail party problem. Cocktail party problem is focus on recognizing the target speech when multiple speakers talk in the noisy real-environments. Many conventional techniques have been proposed. In this work, we propose a new framework to integrate the conventional blind source separation and minimum variance distortionless response beamformer for the speech enhancement and source separation of the recent CHiME-5 challenge. In our experiments, we found that the time–frequency (T–F) mask estimation strategy based on the BSS algorithm should be different for speech enhancement and source separation. The main difference is that whether we need to account for background noise as an additional class during T–F mask estimation. Experimental results showed that the proposed framework was very beneficial to improve the speech recognition performance on the Single-array-track of CHiME-5. We obtained relative 13.5% WER reduction than the official baseline system by only improving the front-end speech enhancement framework.
Published: 2019
Full Text: View/download PDF

2. Self-and-Mixed Attention Decoder with Deep Acoustic Structure for Transformer-based LVCSR

Author: Haizhou Li, Emre Yilmaz, Jiaen Liang, Yanhua Long, Grandee Lee, and Xinyuan Zhou
Subjects: FOS: Computer and information sciences, Sound (cs.SD), Computer science, Speech recognition, Computer Science - Sound, law.invention, Rule-based machine translation, law, Audio and Speech Processing (eess.AS), Test set, FOS: Electrical engineering, electronic engineering, information engineering, Embedding, Transformer, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: The Transformer has shown impressive performance in automatic speech recognition. It uses the encoder-decoder structure with self-attention to learn the relationship between the high-level representation of the source inputs and embedding of the target outputs. In this paper, we propose a novel decoder structure that features a self-and-mixed attention decoder (SMAD) with a deep acoustic structure (DAS) to improve the acoustic representation of Transformer-based LVCSR. Specifically, we introduce a self-attention mechanism to learn a multi-layer deep acoustic structure for multiple levels of acoustic abstraction. We also design a mixed attention mechanism that learns the alignment between different levels of acoustic abstraction and its corresponding linguistic information simultaneously in a shared embedding space. The ASR experiments on Aishell-1 shown that the proposed structure achieves CERs of 4.8% on the dev set and 5.1% on the test set, which are the best results obtained on this task to the best of our knowledge., Comment: Accepted by INTERSPEECH 2020
Published: 2020
Full Text: View/download PDF

3. Exploring nuisance attribute projection and score normalization for GLDS-SVM based automatic mispronunciation detection method

Author: Bo Xu, Shen Huang, Jiaen Liang, Hongyan Li, and ShiJin Wang
Subjects: Support vector machine, Normalization (statistics), business.industry, Speech recognition, Softmax function, Posterior probability, Pattern recognition, Artificial intelligence, Performance improvement, Speaker recognition, business, Mathematics
Abstract: In the task of mispronunciation detection, the cross-speaker degradation and some other confusing nuisances are the challenging problems demanding prompt solution. In this paper, we will attempt to remove the non-pronunciation variations in the GLDS-SVM expansion space by using nuisance attribute projection strategy, in order to increase the separating capacity between different phoneme instances. Moreover, different kinds of score normalization methods with softmax, posterior probability vector (PPV), Z-norm and T-norm are comparatively discussed. The experiments on three kinds of speech corpora demonstrate the effectiveness of the above methods, and the performance improvement is not very significant, but sustainable.
Published: 2011
Full Text: View/download PDF

4. Exploring goodness of prosody by diverse matching templates

Author: Jiaen Liang, Shen Huang, Hongyan Li, Bo Xu, and ShiJin Wang
Subjects: Template, Computer science, business.industry, Speech recognition, Momel, Automatic speech, Pattern recognition, Artificial intelligence, Prosody, business, Query by humming, Sentence
Abstract: In automatic speech grading systems, rare research is followed through addressing the issue of GOR (Goodness Of pRosody). In this paper we propose a novel method by taking the advantage of our QBH (Query By Humming) techniques in 2008 MIREX evaluation task. A set of standard samples related to the top-cream students are initially picked up as templates, a cascade QBH structure is then taken from two metrics: the MOMEL stylization followed by DTW distance; the Fujisaki model followed by EMD distance. Sentence GOR is obtained by the fused confidence between target and each template, and forms a weighted sum as the goodness in the passage level. Experiment results indicate that performance increases with the count of template, and Fujisaki-EMD metric outperforms MOMEL-DTW one in terms of correlation. Their combination can be treated as template based GOR score, compensated with our previous feature based GOR score, the approach can achieve 0.432 in correlation and 17.90% in EER in our corpus. Index Terms: speech prosody, query by humming
Published: 2010
Full Text: View/download PDF

5. Context Dependent Feature Based Bottom-up Rescoring SVM Classifier in Children's English Stress Mis-pronunciation Detection

Author: Jiaen Liang, Shen Huang, ShiJin Wang, Bo Xu, and Hongyan Li
Subjects: business.industry, Computer science, Speech recognition, Feature extraction, Word error rate, Pronunciation, computer.software_genre, Weighting, Support vector machine, Vowel, Stress (linguistics), Artificial intelligence, business, computer, Natural language, Natural language processing
Abstract: Automatic assessment of word stress error is an integral part for oral language grading system. However, problems that the property of vowels depends on its context information and the data sparseness of different vowel class are yet to be solved. This paper shall briefly introduce a hybrid method consisting of both traditional prosodic features and proposed context dependent strategies. In classification word stress is determined by weighting a bottom-up fashioned group tree with modified distributed probability score. In experiment, the overall equal error rate of our proposed system achieves 9.41%, which exhibits relative reduction and its competence of use in stress error detection system.
Published: 2009
Full Text: View/download PDF

6. An efficient mispronounciation detction method using GLDS-SVM and formant enhanced features

Author: ShiJin Wang, Hongyan Li, Jiaen Liang, and Bo Xu
Subjects: Computer science, business.industry, Speech recognition, Feature extraction, Pattern recognition, Support vector machine, Reduction (complexity), ComputingMethodologies_PATTERNRECOGNITION, Formant, Component (UML), Mel-frequency cepstrum, Artificial intelligence, Hidden Markov model, business
Abstract: Mispronunciation detection is an important component in computer assisted language learning (CALL) system. In this work, we introduce an efficient GLDS-SVM based detection method, which is successfully used in language and speaker identification systems, and combine it with traditional methods. The main ideas include: extended MFCC features with normalized formant trajectory information, and then propose a novel multi-model strategy for model training to make full use of samples and solve the problem of data unbalance, finally combine GLDS-SVM method with UBM-GMM system to further improve the performance. Experiments show that GLDS-SVM is highly efficient than traditional RBF-SVM, and the fused system can achieve a significant relative improvement of 17.5% in EER reduction, compared with the baseline UBM-GMM system.
Published: 2009
Full Text: View/download PDF

7. Improving searching speed and accuracy of query by humming system based on three methods: feature fusion, candidates set reduction and multiple similarity measurement rescoring

Author: Lei Wang, Jiaen Liang, Sheng Hu, Shen Huang, and Bo Xu
Subjects: Set (abstract data type), Reduction (complexity), Feature fusion, Similarity (network science), Computer science, business.industry, Speech recognition, Pattern recognition, Artificial intelligence, business, Query by humming
Published: 2008
Full Text: View/download PDF

8. An effective and efficient method for query by humming system based on multi-similarity measurement fusion

Author: Lei Wang, Sheng Hu, Shen Huang, Jiaen Liang, and Bo Xu
Subjects: Dynamic time warping, Computer science, business.industry, Speech recognition, Search engine indexing, Sensor fusion, Machine learning, computer.software_genre, Query by humming, Similarity (network science), Pattern recognition (psychology), Music information retrieval, Artificial intelligence, business, computer, Earth mover's distance
Abstract: Since it is the most natural way for people to search a specific melody in large music database, query by humming/singing is attracting more and more researcherspsila attention in the field of content-based music information retrieval. In this task, note-based and frame-based similarity measures are two commonly used methods. However, in previous works, researchers always focus on one of the two methods alone. In this paper, we propose a novel scheme taking advantage of two different similarity measurements to improve not only the retrieval accuracy but also the retrieving speed. First, Earth Moverpsilas Distance (EMD), which is note-based and much faster, is adopted to eliminate most unlikely candidate. Then, Dynamic Time Warping (DTW), which is frame-based and more accurate, is executed on these surviving candidates. Finally, fusion strategies of these two similarity measurements are employed to improve the performance of whole system. Experiments show our approach can achieve 92.9% accuracy on the database used in MIREX 2006 QBH contest, which is better than those systems participated in that task.
Published: 2008
Full Text: View/download PDF

9. Music Genre Classification Based on Multiple Classifier Fusion

Author: Shen Huang, ShiJin Wang, Jiaen Liang, Lei Wang, and Bo Xu
Subjects: Image fusion, Statistical classification, Contextual image classification, business.industry, Computer science, Speech recognition, Feature extraction, Word error rate, Pattern recognition, Artificial intelligence, Mel-frequency cepstrum, business, Random forest
Abstract: Although researchers have made great progresses on music genre classification in recent years, the need for more accurate system is still not satisfied. In this paper, we propose a method for further reducing the classification error rate based on multiple classifier fusion. First of all, MFCCs and four features from MPEG-7 audio descriptor are extracted in every short time frame, and then a group of frames are gathered into a longer segment, in which mean and variance of these short time frames features are calculated. The segment is considered as the basic unit for training and testing module. Then random forest (RF) and multilayer perceptron neural network (MLP) are executed on such segment independently. Finally, a weighted voting fusion strategy is employed to fusion the result of the two classifiers on each segment, and the whole file decision is made by selecting the most frequently labeled genre over all the segments. Experiments showed that the approach is effective. The fusion result gets 12.4% relative reduction in error rate compared to our baseline system.
Published: 2008
Full Text: View/download PDF

10. Histogram Based Double Gaussian Feature Normalization For Robust Language Recognition

Author: ShiJin Wang, Bo Xu, and Jiaen Liang
Subjects: business.industry, Computer science, Color normalization, Speech recognition, Feature extraction, Normalization (image processing), Pattern recognition, Speaker recognition, computer.software_genre, symbols.namesake, Automatic target recognition, Histogram, symbols, Language model, Artificial intelligence, business, Gaussian process, computer, Natural language processing
Abstract: For automatic language recognition, performance can be seriously degraded due to the transfer characteristics of the communication channel. Many methods are proposed to compensate the effect of the environment for better recognition results. In this paper, we propose a histogram based double Gaussian feature normalization method for robust language recognition. Compared with the baseline system, the proposed method achieves a relative error reduction of 17.4%, which shows advantages over other common feature normalization methods in language recognition systems.
Published: 2007
Full Text: View/download PDF

11. A Novel Phone-State Matrix Based Vocabulary-Indenendent Keyword Spotting Method for Spontaneous Speech

Author: Peng Gao, JiaEn Liang, Bo Xu, and Peng Ding
Subjects: Vocabulary, Computer science, business.industry, media_common.quotation_subject, Speech recognition, Computer Science::Computation and Language (Computational Linguistics and Natural Language and Speech Processing), Spotting, Speech processing, computer.software_genre, Phone, Test set, Keyword spotting, Artificial intelligence, Hidden Markov model, business, computer, Natural language processing, Decoding methods, media_common
Abstract: Keyword spotting (KWS) is an essential technique for speech information retrieval. When doing offline keyword query on large volume spontaneous speech data, fast and accurate KWS methods are required. In this paper, a novel phone-state matrix based vocabulary-independent KWS method is proposed, which has merits of both hidden Markov model (HMM) based and lattice-based methods. Four KWS systems are compared in our experiments on conversational telephone speech test set. Result shows that compared to the high precision HMM-based KWS system the proposed phone-state matrix system has better equal-error-rate (EER) and false-alarm (FA) performance than the other two lattice-based systems.
Published: 2007
Full Text: View/download PDF

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

11 results on '"Jiaen Liang"'

1. Mask-based blind source separation and MVDR beamforming in ASR

2. Self-and-Mixed Attention Decoder with Deep Acoustic Structure for Transformer-based LVCSR

3. Exploring nuisance attribute projection and score normalization for GLDS-SVM based automatic mispronunciation detection method

4. Exploring goodness of prosody by diverse matching templates

5. Context Dependent Feature Based Bottom-up Rescoring SVM Classifier in Children's English Stress Mis-pronunciation Detection

6. An efficient mispronounciation detction method using GLDS-SVM and formant enhanced features

7. Improving searching speed and accuracy of query by humming system based on three methods: feature fusion, candidates set reduction and multiple similarity measurement rescoring

8. An effective and efficient method for query by humming system based on multi-similarity measurement fusion

9. Music Genre Classification Based on Multiple Classifier Fusion

10. Histogram Based Double Gaussian Feature Normalization For Robust Language Recognition

11. A Novel Phone-State Matrix Based Vocabulary-Indenendent Keyword Spotting Method for Spontaneous Speech

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Journal

Database

Publisher

11 results on '"Jiaen Liang"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources