1. Perceptually Weighted Analysis-by-Synthesis Vector Quantization for Low Bit Rate MFCC Codec
- Author
-
Gang Min, Xia Zou, Jibin Yang, and Xiongwei Zhang
- Subjects
Computer science ,Speech recognition ,Mean opinion score ,Speech coding ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,02 engineering and technology ,Intelligibility (communication) ,030507 speech-language pathology & audiology ,03 medical and health sciences ,Codec2 ,Distortion ,0202 electrical engineering, electronic engineering, information engineering ,Codec ,Electrical and Electronic Engineering ,business.industry ,Applied Mathematics ,Codebook ,Vector quantization ,020206 networking & telecommunications ,Pattern recognition ,PSQM ,Adaptive Multi-Rate audio codec ,Signal Processing ,Mel-frequency cepstrum ,Artificial intelligence ,0305 other medical science ,business ,Harmonic Vector Excitation Coding - Abstract
This letter presents a perceptually weighted analysis-by-synthesis vector quantization (VQ) algorithm for low bit rate MFCC codec. Different from conventional VQ of mel-frequency cepstral coefficients (MFCCs) vector, this algorithm uses an analysis-by-synthesis technique and aims to minimize the perceptually weighted spectral reconstruction distortion rather than the distortion of MFCCs vector itself. Also, to reduce the computational complexity, we propose a practical suboptimal codebook searching technique and embed it into the split and multistage VQ framework. Objective and subjective experimental results on Mandarin speech show that the proposed algorithm yields intelligible and natural sounding speech for speech coding at 600–2400 bit/s. Compared to current VQ in MFCC codec, the output speech quality is substantially improved in terms of frequency-weighted segmental SNR, short-time objective intelligibility score, perceptual evaluation of speech quality score, and mean opinion score.
- Published
- 2016
- Full Text
- View/download PDF