Author: "Gao, Changfeng" / Publication Type: Magazines - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Gao, Changfeng"' showing total 2 results

Start Over Author "Gao, Changfeng" Publication Type Magazines

2 results on '"Gao, Changfeng"'

1. ASQ: An Ultra-Low Bit Rate ASR-Oriented Speech Quantization Method

Author: Ye, Lingxuan, Gao, Changfeng, Cheng, Gaofeng, Luo, Liuping, and Zhao, Qingwei
Abstract: For efficient transmission of speech signals, speech compression methodologies have attracted significant research attention for decades and are widely used in automatic speech recognition (ASR) services. However, most speech codecs are perception-oriented, leaving redundant information and introducing distortion, which harms ASR systems. Recently, the emergence of neural network-based models has significantly advanced the progress of ASR systems and speech coding, laying the foundation for building a speech compression method specially optimized for ASR systems. In this letter, we propose an ASR-oriented Speech Quantization (ASQ) method to reduce communication costs for speech recognition systems. In the proposed method, a speech quantization model first converts the speech into low bit rate tokens. Then the tokens are transmitted to the server and recognized by a quantized speech recognition model. The two models could be jointly trained in the end-to-end (E2E) style. To mitigate the performance degradation introduced by the quantization components, we design an entropy-guided 3-stage training method that encourages the model to fully utilize the token space and promote recognition accuracy. Experiment results on the LibriSpeech corpus show that compared to an existing non-quantized ASR model with a 256 kbps transmission bit rate, the proposed method can achieve a transmission bit rate of 0.6 kbps without any influence on word error rate (WER). It also significantly surpasses the 2-step pipeline that first performs speech codec and then recognizes with a several times lower bit rate.
Published: 2024
Full Text: View/download PDF

2. Self-Supervised Pre-Training for Attention-Based Encoder-Decoder ASR Model

Author: Gao, Changfeng, Cheng, Gaofeng, Li, Ta, Zhang, Pengyuan, and Yan, Yonghong
Abstract: End-to-end (E2E) models, including the attention-based encoder-decoder (AED) models, have achieved promising performance on the automatic speech recognition (ASR) task. However, the supervised training process of the E2E model needs a large amount of speech-text paired data. In contrast, self-supervised pre-training can pre-train the model on the unlabeled data and then fine-tune it on the limited labeled data to realize better performance. Most of the previous self-supervised pre-training methods focus on learning hidden representations from speech but ignore how to utilize the unpaired text. As a result, previous works often pre-train an acoustic encoder and then fine-tune it as a classification based ASR model, such as Connectionist Temporal Classification (CTC) based model, rather than an AED model. In this paper, we propose a self-supervised pre-training method for the AED model (SP-AED). The SP-AED method contains acoustic pre-training for the encoder, linguistic pre-training for the decoder, and an adaptive combination fine-tuning for the whole system. We first design a linguistic pre-training method for decoder by utilizing the text-only data. The decoder will be pre-trained as a noise-condition language model to learn the prior distribution of the text. Then, we pre-train the AED encoder with the wav2vec2.0 method with some modifications. Finally, we combine the pre-trained encoder and decoder and fine-tune them on the limited labeled data. We design an adaptive combination method during fine-tuning by modifying the decoder’s input and output to prevent catastrophic forgetting. Experiments prove that compared with the random initialized models, the SP-AED pre-trained models can realize up to 17% relative improvement. And with similar model size or computational cost, we can get comparable results to other classification-based models on both English and Chinese corpus.
Published: 2022
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

2 results on '"Gao, Changfeng"'

1. ASQ: An Ultra-Low Bit Rate ASR-Oriented Speech Quantization Method

2. Self-Supervised Pre-Training for Attention-Based Encoder-Decoder ASR Model

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Publication Year Range

Publication Type

Journal

Database

2 results on '"Gao, Changfeng"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources