Descriptor: "PSQM" / Topic: codec2 - Searchworks@Jio Institute Digital Library Search Results

1. Perceptually Weighted Analysis-by-Synthesis Vector Quantization for Low Bit Rate MFCC Codec

Author: Gang Min, Xia Zou, Jibin Yang, and Xiongwei Zhang
Subjects: Computer science, Speech recognition, Mean opinion score, Speech coding, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, 02 engineering and technology, Intelligibility (communication), 030507 speech-language pathology & audiology, 03 medical and health sciences, Codec2, Distortion, 0202 electrical engineering, electronic engineering, information engineering, Codec, Electrical and Electronic Engineering, business.industry, Applied Mathematics, Codebook, Vector quantization, 020206 networking & telecommunications, Pattern recognition, PSQM, Adaptive Multi-Rate audio codec, Signal Processing, Mel-frequency cepstrum, Artificial intelligence, 0305 other medical science, business, Harmonic Vector Excitation Coding
Abstract: This letter presents a perceptually weighted analysis-by-synthesis vector quantization (VQ) algorithm for low bit rate MFCC codec. Different from conventional VQ of mel-frequency cepstral coefficients (MFCCs) vector, this algorithm uses an analysis-by-synthesis technique and aims to minimize the perceptually weighted spectral reconstruction distortion rather than the distortion of MFCCs vector itself. Also, to reduce the computational complexity, we propose a practical suboptimal codebook searching technique and embed it into the split and multistage VQ framework. Objective and subjective experimental results on Mandarin speech show that the proposed algorithm yields intelligible and natural sounding speech for speech coding at 600–2400 bit/s. Compared to current VQ in MFCC codec, the output speech quality is substantially improved in terms of frequency-weighted segmental SNR, short-time objective intelligibility score, perceptual evaluation of speech quality score, and mean opinion score.
Published: 2016
Full Text: View/download PDF

2. Wavelet energy based voice activity detection and adaptive thresholding for efficient speech coding

Author: Shijo M. Joseph and Anto P. Babu
Subjects: Discrete wavelet transform, Linguistics and Language, Voice activity detection, Computer science, Speech recognition, Speech coding, 020207 software engineering, 02 engineering and technology, PSQM, Linear predictive coding, Thresholding, Language and Linguistics, Human-Computer Interaction, 030507 speech-language pathology & audiology, 03 medical and health sciences, Wavelet, Codec2, 0202 electrical engineering, electronic engineering, information engineering, Computer Vision and Pattern Recognition, 0305 other medical science, Software
Abstract: During the last five decades, extensive researches have been carried out in the field of speech compression, which has resulted in various techniques for speech coding. Researchers have been in full swing for more efficient speech coding and their effort is still continuing in different parts of the world. In this paper we are proposing an alternative method for better speech coding. In the proposed technique we use discrete wavelet transform to decompose the signal and wavelet energy is used to differentiate between active voice region and silence region in the speech signal. Depending upon the region’s status the system, different thresholding strategies have been chosen which leads to a better compression without any loss of speech intelligibility. The proposed method is evaluated in terms of qualitative and quantitative parameters. In this paper we also propose an alternative parameter for MOS values which is here after known as System Recognition Rate.
Published: 2016
Full Text: View/download PDF

3. Two-Band Radial Postfiltering in Cepstral Domain with Application to Speech Synthesis

Author: Daniel Erro
Subjects: Computer science, Applied Mathematics, Speech recognition, Speech coding, 020206 networking & telecommunications, Speech synthesis, 02 engineering and technology, PSQM, Intelligibility (communication), Speech processing, Linear predictive coding, computer.software_genre, Speech enhancement, 030507 speech-language pathology & audiology, 03 medical and health sciences, Codec2, Signal Processing, Cepstrum, 0202 electrical engineering, electronic engineering, information engineering, Electrical and Electronic Engineering, 0305 other medical science, computer
Abstract: Postfiltering is a well known technique that helps increasing the quality of coded speech, enhancing speech intelligibility or alleviating the oversmoothing effect of statistical speech processing methods. This letter presents a new formulation of the radial cepstral postfiltering method that enables the application of different postfiltering factors to low and high frequencies. The transition between bands will be smooth and controllable through an adjustable cut-off frequency. The proposed algorithm can be implemented by means of a simple multiplicative matrix of which an analytical expression is derived. The new method provides a flexible framework to tackle issues related to the quality and intelligibility of synthetic speech.
Published: 2016
Full Text: View/download PDF

4. WORLD: A Vocoder-Based High-Quality Speech Synthesis System for Real-Time Applications

Author: Fumiya Yokomori, Kenji Ozawa, and Masanori Morise
Subjects: Voice activity detection, Computer science, Speech recognition, Real-time computing, Speech coding, 020206 networking & telecommunications, Speech synthesis, 02 engineering and technology, PSQM, Intelligibility (communication), computer.software_genre, Linear predictive coding, Speech processing, Codec2, Artificial Intelligence, Hardware and Architecture, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Computer Vision and Pattern Recognition, Electrical and Electronic Engineering, computer, Software
Published: 2016
Full Text: View/download PDF

5. A long term harmonic plus noise model for narrow-band speech coding at very low bit-rates

Author: Sonia Djaziri-Larbi and Faten Ben Ali
Subjects: Voice activity detection, Computer science, Speech recognition, Speech coding, PSQM, Linear predictive coding, 030507 speech-language pathology & audiology, 03 medical and health sciences, Noise, Adaptive Multi-Rate audio codec, Codec2, Voice, Codec, Active listening, 0305 other medical science, PESQ
Abstract: This paper presents a very low bit-rate speech codec based on the long-term Harmonic plus Noise Model (LT-HNM). The HNM is known to be efficient in terms of speech signal representation, thanks to the use of natural parameters: fundamental and voicing cut-off frequencies, harmonics and noise frequencies. Besides, the long-term modeling is particularly efficient in reducing the data size of the model parameters. In this paper we combine both approaches, long-term modeling and HNM, to develop a very low bit-rate coder for narrowband speech. The obtained bit-rates are as low as 2.3 kbps with objective listening quality (perceptual evaluation of speech quality PESQ) of 2.3.
Published: 2017
Full Text: View/download PDF

6. Performance Analysis of Advanced Hybrid Speech Coding Techniques in Time domain, Spectral domain and Perceptual domain

Author: Talaat A. Elgarf, Mohamed M. Fouad, and Eslam Samy El-Mokadem
Subjects: Voice activity detection, Codec2, Computer science, Speech recognition, Speech coding, PSQM, Intelligibility (communication), Linear predictive coding, Speech processing, PESQ
Abstract: Speech coding is the art of creating a minimally redundant representation of the speech signal that can be efï¬ciently transmitted or stored in digital media and decoding the signal with the best possible perceptual Quality. The speech transmission in wireless networks is associated with the reduction of extra information present in signal in such a way to preserve the quality and intelligibility of speech. It is known that the lower the bit rate the lesser the quality of the reconstructed speech however there is a constant quest to achieve a better speech quality at lower bit-rates.This paper presents performance analysis for the quality of advanced hybrid speech coding techniques in Time domain, Spectral domain and perceptual domain. These analyses are implemented on three different algorithms of advanced hybrid speech coding techniques such as CELP, G729 Annex A, G723.1 to assess the quality performance for English female speaker, English male speaker and Arabic female speaker by using Mat lab simulation program. Our evaluation criterion implemented includes the following tests: Signal to Noise Ratio (SNR), Segmental Signal to Noise Ratio (SNRseg), The Log-Likelihood Ratio (LLR), The Weighted Spectral Slope (WSS), Absolute Error,Perceptual Evaluation of Speech Quality (PESQ),NONE">Rating of speechdistortion, rating of background noise and the predicted rating of overall quality.
Published: 2014
Full Text: View/download PDF

7. Research on an Embedded Ultra-Low-Bit-Rate Speech Coding Algorithm

Author: Fei Yuan, Yan Hong Fan, Ye Li, and Xiaomei Xu
Subjects: Code-excited linear prediction, Voice activity detection, Computer science, Speech recognition, Speech coding, General Engineering, Full Rate, PSQM, Linear predictive coding, Speech processing, Multi-Band Excitation, Codec2, Adaptive Multi-Rate audio codec, Mixed-excitation linear prediction, Vector sum excited linear prediction, Harmonic Vector Excitation Coding
Abstract: Ultra-low-bit-rate speech coding algorithm was in great demand for many fields such as underwater speech communications. Underwater speech communication for middle-long distance has the characteristics of narrow bandwidth as well as low transmission rate, which makes the underwater speech communication much difficult. Ultra-low-bit-rate speech coding algorithm plays an important role on this occasion. More over, it will be more flexible for the underwater speech communication system if the speech coding algorithm has an embedded structure. The paper introduced the principle of an embedded speech coding algorithm with dual rates at both 300bps and 400bps based on the enhanced mixed excitation linear prediction model. The results show that this embedded ultra-low-bit-rate speech coding algorithm has satisfactory quality under both DRT and MOS test.
Published: 2014
Full Text: View/download PDF

8. An Automatic Broadcast System for a Weather Report Radio Program

Author: H. Segi, R. Takou, T. Takagi, N. Seiyama, Shinji Ozawa, Yuko Uematsu, and Hideo Saito
Subjects: Voice activity detection, Computer science, Speech recognition, Acoustic model, Speech synthesis, PSQM, Linear predictive coding, computer.software_genre, Speech processing, Codec2, Media Technology, Radio program, Electrical and Electronic Engineering, computer
Abstract: Here we describe a speech-synthesis method using templates that can generate recording-sentence sets for speech databases and produce natural sounding synthesized speech. Applying this method to the Japan Broadcasting Corporation (NHK) weather report radio program reduced the size of the recording-sentence set required to just a fraction of that needed by a comparable method. After integrating the recording voice of the generated recording-sentence set into the speech database, speech was produced by a voice synthesizer using templates. In a paired-comparison test, 66% of the speech samples synthesized by our system using templates were preferred to those produced by a conventional voice synthesizer. In an evaluation test using a five-point mean opinion score (MOS) scale, the speech samples synthesized by our system scored 4.97, whereas the maximum score for commercially available voice synthesizers was 3.09. In addition, we developed an automatic broadcast system for the weather report program using the speech-synthesis method and speech-rate converter. The system was evaluated using real weather data for more than 1 year, and exhibited sufficient stability and synthesized speech quality for broadcast purposes.
Published: 2013
Full Text: View/download PDF

9. Encoding Navigable Speech Sources: A Psychoacoustic-Based Analysis-by-Synthesis Approach

Author: Jiangtao Xi, Christian Ritz, and Xiguang Zheng
Subjects: Voice activity detection, Acoustics and Ultrasonics, Computer science, Speech recognition, Speech coding, Acoustic model, Speech synthesis, PSQM, Linear predictive coding, computer.software_genre, Speech processing, Codec2, Electrical and Electronic Engineering, computer
Abstract: This paper presents a psychoacoustic-based analysis-by-synthesis approach for compressing navigable speech sources. The approach targets multi-party teleconferencing applications, where selective reproduction of individual speech sources is desired. Based on exploiting sparsity of speech in the perceptual time-frequency domain, multiple speech signals are encoded into one mono mixture signal, which can be further compressed using a standard speech codec. Using side information indicating the active speech source for each time frequency instant enables flexible decoding and reproduction. Objective results highlight the importance of considering perception when exploiting the sparse nature of speech in the time-frequency domain. Results show that this sparsity, as measured by the preserved energy level of perceptually important time-frequency components extracted from mixtures of speech signals, is similar in both anechoic and reverberant environments. The proposed approach is applied to a series of simulated and real reverberant speech recordings, where the resulting speech mixtures are compressed using a standard speech codec operating at 32 kbps. The perceptual quality, as judged both by objective and subjective evaluations, outperforms a simple sparsity approach that does not consider perception as well as the approach that encodes each source separately. While the perceptual quality of individual speech sources is maintained, subjective tests also confirm the approach maintains the perceptual quality of the spatialized speech scene.
Published: 2013
Full Text: View/download PDF

10. Artificial bandwidth extension to improve automatic emotion recognition from narrow-band coded speech

Author: Catherine Sandoval Rodriguez, Abas Albahri, and Margaret Lech
Subjects: Voice activity detection, Computer science, Speech recognition, 0206 medical engineering, Speech coding, Acoustic model, 02 engineering and technology, PSQM, Speech processing, Speaker recognition, Linear predictive coding, 020601 biomedical engineering, 030507 speech-language pathology & audiology, 03 medical and health sciences, Codec2, 0305 other medical science
Abstract: Narrow-band speech coding techniques were previously found to reduce the accuracy of automatic Speech Emotion Recognition (SER), as well as speech and speaker recognition rates. Artificial Bandwidth Extension (ABE) based on spectral folding and spectral envelope estimation has been applied to compressed narrowband speech to test if an improvement in SER can be achieved. The modelling and classification of speech was performed with a benchmark approach based on the GMM classifier and a set of speech acoustic parameters including MFCCs, TEO and glottal parameters. The tests used the Berlin Emotional Speech data base. In general, ABE led to an improvement of SER accuracy; however the amount of improvement varied between different features, genders, and speech compression rates. In all cases, SER accuracy with ABE was at least 10% lower than for uncompressed speech.
Published: 2016
Full Text: View/download PDF

11. Effects of band reduction and coding on speech emotion recognition

Author: Abas Albahri and Margaret Lech
Subjects: Voice activity detection, Computer science, Speech recognition, 0206 medical engineering, Speech coding, 02 engineering and technology, PSQM, Speech processing, Linear predictive coding, 020601 biomedical engineering, 01 natural sciences, Codec2, Frequency domain, 0103 physical sciences, Mel-frequency cepstrum, 010301 acoustics
Abstract: Majority of Speech Emotion Recognition results refer to full-band uncompressed speech signals. Potential applications of SER on various types of speech platforms pose important questions about potential effects of bandwidth limitations and compression techniques used by speech communication systems on the accuracy of SER. The current study provides answers to these questions based on SER experiments with a band-limited speech as well as compressed speech. Compression techniques included AMR, AMR-WB, AMR-WB+ and mp3 methods. The modelling and classification of speech emotions was achieved using a benchmark approach based on the GMM classifier and speech features including MFCCs, TEO and glottal time and frequency domain parameters. The tests used the Berlin Emotional Speech database with speech signals sampled at 16 kHz. The results indicated that the low frequency components (0–1 kHz) of speech as well as, the high frequency components (above 4 kHz) play an important role in SER. The mp3 compression worked better with the MFCC features than with the TEO and glottal parameters. The AMR-WB and AMR-WB+ outperformed the AMR.
Published: 2016
Full Text: View/download PDF

12. An 8.3mW 1.6Msamples/s multi-modal event-driven speech enhancement processor for robust speech recognition in smart glasses

Author: Hoi-Jun Yoo, Injoon Hong, Jinmook Lee, and Seong-Wook Park
Subjects: Voice activity detection, Computer science, Speech recognition, Speech coding, 020206 networking & telecommunications, Clock gating, 02 engineering and technology, PSQM, Linear predictive coding, Speech processing, Speech enhancement, 030507 speech-language pathology & audiology, 03 medical and health sciences, Codec2, 0202 electrical engineering, electronic engineering, information engineering, 0305 other medical science
Abstract: A low-power and high-speed speech enhancement processor for speech enhancement of noisy inputs is proposed to realize the robust speech recognition in smart glasses. It has 3 key schemes: multi-modal speech selection, look-up table based non-linear approximation circuits, and speech detection controlled dynamic clock gating. The multi-modal speech selection scheme uses three parameters to enhance the limited accuracy of the previous uni-modal user speech selection up to 98.1%. The non-linear function approximation circuit accelerates the throughput of the speech enhancement by 10.7×. The speech detection controlled clock gating reduces the redundant power consumption by 51% when there is no user voice. The proposed speech enhancement processor achieves 1.6Msamples/s throughput and 8.3mW average power consumption with the 98.1% true positive rate of speech selection in 65nm CMOS process.
Published: 2016
Full Text: View/download PDF

13. 400bps High-Quality Speech Coding Algorithm

Author: Jingsai Jiang, Ye Li, Xiaofeng Ma, Qiuyun Hao, Peng Zhang, and Yanhong Fan
Subjects: Codec2, Speech recognition, Speech coding, Vector quantization, PSQM, Intelligibility (communication), Linear predictive coding, Vector sum excited linear prediction, Harmonic Vector Excitation Coding, Mathematics
Abstract: Low bit rate speech coding is important to speech communications over band-limited or harsh channels. In this paper, based on the mixed excitation linear prediction (MELP) model, we propose a high-quality 400bps low bit rate speech coding algorithm which introduces multi-frame joint vector quantization, adaptive spectral enhancement and multi-band sinusoidal mixed excitation. Efficient parameter quantization schemes are employed on the basis of the super-frame structure. It is verified that the synthesized speech has fairly high intelligibility and naturalness, and the mean opinion score (MOS) is about 2.52.
Published: 2016
Full Text: View/download PDF

14. Real-time speech communication system based on optimized G.723.1

Author: Kang Han, Haoyu Chen, Wanggen Wan, and Guoliang Chen
Subjects: Voice activity detection, Computer science, 020208 electrical & electronic engineering, Real-time computing, Speech coding, 02 engineering and technology, PSQM, Speech processing, Linear predictive coding, 020202 computer hardware & architecture, Codec2, Adaptive Multi-Rate audio codec, Audio codec, 0202 electrical engineering, electronic engineering, information engineering
Abstract: A qualified speech communication is quite important in modern communication. Thus, the implementation of security algorithms is also very important to achieve real-time applications. In this paper, we design a robust and full-duplex real-time speech communication system based on Texas Instrument's 32-bit floating point DSP TMS320C6748. We use ITU-T G.723.1 as the audio codec in this system. According to the hardware of the system and also to meet the requirement of high-quality and real-time communication, several methods are introduced in this paper to optimize the algorithm. Through the optimization, time spending in compression and decompression reduce from more than 1000ms to about 10ms which guarantees a real-time communication. Our designed system has been successfully used for the speech communication of railway system.
Published: 2016
Full Text: View/download PDF

15. Development and testing of the voice activity detector based on use of special pilot signal

Author: Vladimir Vityazev and Vladimir A. Volchenkov
Subjects: Voice activity detection, Codec2, Noise (signal processing), Computer science, Speech recognition, Speech coding, Detector, PSQM, Linear predictive coding, Speech processing
Abstract: A voice activity detector (VAD) is a device, which analyses a speech signal and generates the signal corresponding to the period containing only noise. In the present work is offered VAD, which is intended for voice coding algorithms which primarily require correct pauses detection in human speech.
Published: 2016
Full Text: View/download PDF

16. A steganography scheme in a low-bit rate speech codec based on 3D-sudoku matrix

Author: Fufang Li, Xueshun Peng, and Yongfeng Huang
Subjects: 021110 strategic, defence & security studies, Voice activity detection, Steganography, Computer science, Speech recognition, Speech coding, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, 0211 other engineering and technologies, Data_CODINGANDINFORMATIONTHEORY, 02 engineering and technology, PSQM, Enhanced Variable Rate Codec, Full Rate, Adaptive Multi-Rate audio codec, Codec2, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing
Abstract: Redundant information of low-bit rate speech is extremely small, thus it's very difficult to implement large capacity steganography on the low-bit rate speech. Based on multiple vector quantization characteristics of the Line Spectrum Pair (LSP) of the speech codec, this paper proposes a steganography scheme using a 3D-sudoku matrix to enlarge capacity and improve quality of speech. A cyclically moving algorithm to construct 3D-Sudoku matrix for steganography is proposed in this paper, as well as an embedding and an extracting algorithm of steganography based on 3D-Sudoku matrix in low-bit rate speech codec. Theoretical analysis is provided to demonstrate that the concealment and the hidden capacity are greatly improved with the proposed scheme. Experimental results show the hidden capacity is raised to 200bps in ITU-T G.723.1 codec. Moreover, the quality of steganography speech in Perceptual Evaluation of Speech Quality (PESQ) reduces no more than 4%, indicating little impact on the quality of speech.
Published: 2016
Full Text: View/download PDF

17. On the use of discrete wavelet transform for robust scalable speech coding

Author: Tokunbo Ogunfunmi and Koji Seto
Subjects: Discrete wavelet transform, Voice activity detection, Modified discrete cosine transform, Computer science, Speech recognition, 020208 electrical & electronic engineering, Speech coding, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, Data_CODINGANDINFORMATIONTHEORY, 02 engineering and technology, PSQM, Wavelet packet decomposition, 030507 speech-language pathology & audiology, 03 medical and health sciences, Wavelet, Codec2, Adaptive Multi-Rate audio codec, 0202 electrical engineering, electronic engineering, information engineering, Codec, 0305 other medical science
Abstract: We developed scalable narrowband and wideband speech coding schemes based on the internet low bitrate codec (iLBC). Some of these newer codecs used the Discrete Wavelet Transform (DWT) instead of the Modified Discrete Cosine Transform (MDCT). This paper explores the choice of wavelet packet transform (WPT) for an application for a new scalable speech codec for IP networks using the Discrete Wavelet Transform (DWT) to encode the core-layer coding error in the enhancement layer. The issues regarding the design and in particular the choice of wavelet for the wideband codec are discussed. Experimental simulation results show that the DWT is a promising technique to use for encoding highly non-stationary signals such as the speech coding error. The wideband codec achieved speech quality equivalent to ITU-G.718 and similar codecs and is more robust. We also show that the best choice of wavelet depends on many factors including the order and number o f levels of the wavelet tree, delay and how well it approximates the human auditory system.
Published: 2016
Full Text: View/download PDF

18. High capacity information hiding scheme using VAD algorithm

Author: Rong-San Lin
Subjects: 021110 strategic, defence & security studies, Voice activity detection, Steganography, Computer science, Speech recognition, Speech coding, 0211 other engineering and technologies, 02 engineering and technology, PSQM, Codec2, Information hiding, 0202 electrical engineering, electronic engineering, information engineering, Codec, 020201 artificial intelligence & image processing, Bitstream, Algorithm, PESQ
Abstract: This paper proposes a high-capacity hiding scheme for embedding secret message in the inactive frames of low-bit rate speech bitstream. Our information-hiding scheme can correctly extract secret message in the receiver. The scheme uses a flag to synchronize the embedding and extraction process in steganography. The results of an imperceptibility evaluation indicate that the average perceptual evaluation of speech quality (PESQ) score is degraded only slightly 0.023, relative to that of original speech that is not hidden data. Experimental results show that our proposed hiding algorithm not only achieves perfect imperceptibility but also produces a high data-embedding rate, on average 26% for 5.3k bit/s coding.
Published: 2016
Full Text: View/download PDF

19. An impact of wideband speech codec mismatch on a performance of GMM-UBM speaker verification over telecommunication channel

Author: Roman Jarina, Peter Pocta, and Jozef Polacky
Subjects: Speaker diarisation, Voice activity detection, Adaptive Multi-Rate audio codec, Codec2, Computer science, Speech recognition, Codec, Data_CODINGANDINFORMATIONTHEORY, PSQM, Speaker recognition, Wideband audio
Abstract: An automatic verification of person's identity from its voice is a part of modern telecommunication services. In order to execute a verification task, a speech signal has to be transmitted to a remote server. So, a performance of the verification system can be influenced by various distortions that can occur when transmitting a speech signal through a communication channel. This paper studies an effect of the state of art wideband (WB) speech codecs on a performance of automatic speaker verification in the context of a channel/codec mismatch between enrollment and test utterances. The speaker verification system is developed on GMM-UBM method. The results show that EVS codec provides the best performance over all the investigated scenarios in this study. Moreover, deploying G.729.1 codec in a training process of the verification system provides the best equal error rate in the fully-codec mismatched scenario. Anyhow, differences between the equal error rates reported for all of the codecs involved in this scenario are mostly nonsignificant.
Published: 2016
Full Text: View/download PDF

20. Influence of packet loss on a speaker verification system over IP network

Author: Roman Jarina, Peter Pocta, and Jozef Polacky
Subjects: Voice activity detection, Voice over IP, Computer science, business.industry, Speech recognition, ComputerSystemsOrganization_COMPUTER-COMMUNICATIONNETWORKS, Speech coding, Data_CODINGANDINFORMATIONTHEORY, 02 engineering and technology, PSQM, Linear predictive coding, 030507 speech-language pathology & audiology, 03 medical and health sciences, Codec2, Transmission (telecommunications), Packet loss, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, 0305 other medical science, business, Computer network
Abstract: The paper considers an influence of packet loss on a remote speaker verification in Voice over IP (VoIP) environment. A lossy speech coding and packet loss represent a significant part of speech degradation in the VoIP environment. As an extent of packet loss impact is tightly related to a type of speech coder used to transmit speech data, different transmission conditions along with different speech codecs are investigated here. The speaker verification system used in this experimental study is based on a probabilistic GMM-UBM approach. In this paper, a speaker verification accuracy is evaluated against a level of packet loss in narrowband and wideband communication channel.
Published: 2016
Full Text: View/download PDF

21. CELP and MELP speech coding techniques

Author: Rhutuja Jage and Savitha Upadhya
Subjects: Code-excited linear prediction, Voice activity detection, Computer science, Speech recognition, Mean opinion score, Speech coding, PSQM, Full Rate, Linear predictive coding, Coding gain, Signal-to-noise ratio, Codec2, Mixed-excitation linear prediction, Bit rate, Voice, Vector sum excited linear prediction, Harmonic Vector Excitation Coding, Data compression
Abstract: Speech is one of the natural ways of communication amongst humans. Nowadays there is insatiable demand for speech communication as it carries more information like speaker identity, emotional state, prosodic nuance which adds naturalness in communication. With rapid growth and increased number of applications there exists a need for devising an approach for data compression techniques which reduces communication cost by using available bandwidth and storage space effectively. The speech coding techniques helps to achieve bit rate reduction by simultaneously maintaining original speech quality. In this paper, Hybrid speech coding technique i.e. Code Excited Linear Prediction (CELP) and Parametric coding technique i.e. Mixed Excitation Linear Prediction (MELP) are discussed and CELP technique is implemented using MATLAB. The parameters like mean square error (MSE), Mean Opinion Score (MOS), and Signal to Noise Ratio are calculated for CELP technique which shows that CELP technique is an improvement to a coder called Linear Predictive Coder (LPC). It is an efficient coding technique for the bit rate of 16–9.6 kbps. The MELP coder discussed here helps to remove the voicing error in two state excitation model of LPC. It is a low bit rate coder having a bit rate of 2.4 kbps and mainly used by military and federal standards.
Published: 2016
Full Text: View/download PDF

22. Assessment of automatic speaker verification on lossy transcoded speech

Author: Roman Jarina, Jozef Polacky, and Michal Chmulik
Subjects: 0209 industrial biotechnology, Voice activity detection, Computer science, Speech recognition, Acoustic model, Data_CODINGANDINFORMATIONTHEORY, 02 engineering and technology, PSQM, Speech processing, Speaker recognition, Speaker diarisation, 030507 speech-language pathology & audiology, 03 medical and health sciences, 020901 industrial engineering & automation, Adaptive Multi-Rate audio codec, Codec2, 0305 other medical science
Abstract: In this paper, we investigate the effect of lossy speech compression on text-independent speaker verification task. We have evaluated the voice biometrics performance over several state-of-the art speech codecs including recently released Enhanced Voice Services (EVS) codec. The tests were performed in both codec-matched and codec-mismatched scenarios. The test results show that EVS outperforms other speech codecs used in our test and it can be used to generate speaker models that are quite robust to varying compression levels. It was also shown that if a speech codec of higher quality (EVS, G711) is included in training data (mismatched and partially mismatched scenarios), the automatic speaker verification (ASV) gives better results than in the case of matched scenario.
Published: 2016
Full Text: View/download PDF

23. Steganography Integration Into a Low-Bit Rate Speech Codec

Author: Yongfeng Huang, Shanyu Tang, Chenghao Liu, and Sen Bai
Subjects: Steganography tools, Steganalysis, Voice activity detection, Steganography, Computer Networks and Communications, Computer science, Speech recognition, Speech coding, PSQM, Full Rate, Cyber-security, Linear predictive coding, Adaptive Multi-Rate audio codec, Codec2, Codec, Safety, Risk, Reliability and Quality, Information-security
Abstract: Low bit-rate speech codecs have been widely used in audio communications like VoIP and mobile communications, so that steganography in low bit-rate audio streams would have broad applications in practice. In this paper, the authors propose a new algorithm for steganography in low bit-rate VoIP audio streams by integrating information hiding into the process of speech encoding. The proposed algorithm performs data embedding while pitch period prediction is conducted during low bit-rate speech encoding, thus maintaining synchronization between information hiding and speech encoding. The steganography algorithm can achieve high quality of speech and prevent detection of steganalysis, but also has great compatibility with a standard low bit-rate speech codec without causing further delay by data embedding and extraction. Testing shows, with the proposed algorithm, the data embedding rate of the secret message can attain 4 bits / frame (133.3 bits / second).
Published: 2012
Full Text: View/download PDF

24. Comparative Analysis of Speech Compression Algorithms with Perceptual and LP based Quality Evaluations

Author: Nasir Saleem, Sher Ali, and Sunniya Nasir
Subjects: Voice activity detection, Codec2, business.industry, Computer science, Speech recognition, Bit rate, Speech coding, PSQM, business, Linear predictive coding, Digital signal processing, PESQ, Data compression
Abstract: compression is one of the leading vicinity of digital signal processing that spotlight on dipping the bit rate of speech signals for transmission and storage devoid of considerable loss of quality. In past decades many speech coding techniques have been proposed for speech analysis. This paper attempts to assess and compare two compression techniques on speech signals. To execute this idea we have chosen two low bit rate and widely used speech analysis methods called VELP and MELP. The performances of both are evaluated by performing objective quality tests including PESQ, IS and CEP. Similar speech files are tested with both coders. The objective assessments show that at low bit rate the MELP shows better performance as compared to VELP.
Published: 2012
Full Text: View/download PDF

25. Speech Compression for Noise-Corrupted Thai Expressive Speech

Author: Suphattharachai Chomphan
Subjects: Voice activity detection, Codec2, Artificial Intelligence, Computer Networks and Communications, Computer science, Speech recognition, Mean opinion score, Speech coding, PSQM, Intelligibility (communication), Linear predictive coding, Software
Abstract: Problem statement: In speech communication, speech coding aims at preserving the speech quality with lower coding bitrate. When considering the communication environment, various types of noises deteriorates the speech quality. The expressive speech with different speaking styles may cause different speech quality with the same coding method. Approach: This research proposed a study of speech compression for noise-corrupted Thai expressive speech by using two coding methods of CS-ACELP and MP-CELP. The speech material included a hundredmale speech utterances and a hundred female speech utterances. Four speaking styles included enjoyable, sad, angry and reading styles. Five sentences of Thai speech were chosen. Three types of noises were included (train, car and air conditioner). Five levels of each type of noise were varied from 0-20 dB. The subjective test of mean opinion score was exploited in the evaluation process. Results: The experimental results showed that CS-ACELP gave the better speech quality than that of MP-CELP at all three bitrates of 6000, 8600-12600 bps. When considering the levels of noise, the 20-dB noise gave the best speech quality, while 0-dB noise gave the worst speech quality. When considering the speech gender, female speech gave the better results than that of male speech. When considering the types of noise, the air-conditioner noise gave the best speech quality, while the train noise gave the worst speech quality. Conclusion: From the study, it can be seen that coding methods, types of noise, levels of noise, speech gender influence on the coding speech quality.
Published: 2011
Full Text: View/download PDF

26. Analysis of Fundamental Frequency Contour of Coded Speech Based on Multi-Pulse Based Code Excited Linear Prediction Algorithm

Author: Suphattharachai Chomphan
Subjects: Code-excited linear prediction, Voice activity detection, Codec2, Artificial Intelligence, Computer Networks and Communications, Computer science, Speech recognition, Speech coding, PSQM, Fundamental frequency, Linear predictive coding, Vector sum excited linear prediction, Software
Abstract: Problem statement: In low-bit-rate speech communication, speech coding deteriorates the characteristics of the coded speech significantly. An important feature of the speech is the fundamental frequency contour which determines the pitch information of the speech. It has been known that pitch information is one of the core parameter of the multi-pulse based code excited linear prediction (MPCELP) speech coder. Therefore the study of the deteriorated fundamental frequency contour should be conducted properly. Approach: This study proposes an analysis of the fundamental frequency contour of the coded speech based on MP-CELP speech coder. The comparison of the fundamental frequency contour of the natural speech and that of the coded speech has been performed. The MP-CELP with three levels of bitrate scalability is selected as the core speech coder. The speech material includes a hundred of male speech utterances and a hundred of female speech utterances. Results: The experimental results show that the speech coder causes the deterioration of the fundamental frequency contour empirically. The Root Mean Square Error (RMSE) between the fundamental frequency contour of the natural speech and that of the coded speech for three different bitrates has been conducted. The lower bitrate causes the higher value of RMSE. Conclusion: From the study, it is a proved that the MP-CELP speech coder deteriorates the fundamental frequency contour of the transmitted speech.
Published: 2011
Full Text: View/download PDF

27. Multi-level error detection and concealment algorithm to improve speech quality in GSM full rate speech codecs

Author: Ming Li, Xiaoqing Liu, Jia Liu, and Linfang Wang
Subjects: Multidisciplinary, Voice activity detection, Codec2, Computer science, GSM, Speech recognition, Speech coding, Bit error rate, Data_CODINGANDINFORMATIONTHEORY, Full Rate, PSQM, Linear predictive coding, Algorithm
Abstract: Digital mobile telecommunication systems, such as the global system for mobile (GSM) system, want to further improve speech communication quality without changing the channel encoders and decoders. Speech quality is most affected by residual bit errors in received speech frames. Conventional methods use binary decision strategies for error detection and concealment in frames. This paper presents a multi-level error detection and concealment algorithm for GSM full rate speech codec systems. The algorithm uses multi-source knowledge to detect and conceal speech frame errors at the frame, parameter, and even bit levels. Tests show that most corrupted frames can be appropriately concealed by this algorithm, resulting in MOS gains of more than 50% for real-world data tests.
Published: 2011
Full Text: View/download PDF

28. Thai Speech Coding Based On Conjugate-Structure Algebraic Code Excited Linear Prediction Algorithm

Author: Suphattharachai Chomphan
Subjects: Code-excited linear prediction, Voice activity detection, Codec2, Artificial Intelligence, Computer Networks and Communications, Computer science, Speech recognition, Speech coding, Algebraic code-excited linear prediction, PSQM, Linear predictive coding, Vector sum excited linear prediction, Software
Abstract: Problem statement: In mobile communication, speech coding aims at compressing the speech with lowest bitrate and highest quality for standard languages such as English, German and French. As for other languages with different utter ing styles, the encoded speech quality is not guaranteed at the same bitrate. The appropriate eva luation should be performed to develop the speech quality by applying some suitable techniques. Approach: This study presents the comparison results of speech quality that is encoded and decoded by CS -ACELP coder according to ITU-G.729 standard. The purpose is to test the performance of CS-ACELP coder between Thai speech and English speech. Results: The study used 2 coding methods; (1) CS-ACELP coder without Voice Activity Detection and (2) CS-ACELP coder with Voice Activity Detection. The objective test was used to measure the speech quality for each case. The results show that both m ethods give Thai speech quality mostly below than English speech quality, as for methods comparison; both Thai and English, method (2) gives speech quality better than method (1). Eventually, we modi fied the coder by increasing the order of LP analysis to improve the Thai speech quality. Conclusion: From the finding, by no other modification, the quality of Thai coding is not equivalent to the English Language. After modifying the LP analysis by increasing the LP order from 10-12 or 14, the qu ality of Thai speech coding are truly improved. But the coding rate also increased for allocating the h igher order information.
Published: 2011
Full Text: View/download PDF

29. ITU-T G.711.1: extending G.711 to higher-quality wideband speech

Author: H. Ohmuro and Y. Hiwasaki
Subjects: Voice activity detection, Computer Networks and Communications, business.industry, Computer science, Speech coding, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, Data_CODINGANDINFORMATIONTHEORY, Enhanced Variable Rate Codec, Full Rate, PSQM, Wideband audio, Computer Science Applications, Adaptive Multi-Rate audio codec, Codec2, Extended Adaptive Multi-Rate – Wideband, Codec, Electrical and Electronic Engineering, business, Bitstream, Computer hardware, Computer network
Abstract: In March 2008 the ITU-T approved a new wideband speech codec called ITU-T G.711.1. This Recommendation extends G.711, the most widely deployed speech codec, to 7 kHz audio bandwidth and is optimized for voice over IP applications. The most important feature of this codec is that the G.711.1 bitstream can be transcoded into a G.711 bitstream by simple truncation. G.711.1 operates at 64, 80, and 96 kb/s, and is designed to achieve very short delay and low complexity. ITU-T evaluation results show that the codec fulfils all the requirements defined in the terms of reference. This article presents the codec requirements and design constraints, describes how standardization was conducted, and reports on the codec performance and its initial deployment.
Published: 2009
Full Text: View/download PDF

30. Multi-stage speech enhancement for automatic speech recognition

Author: Namgook Cho, Young-Woo Lee, and Seung-Yeol Lee
Subjects: Voice activity detection, Noise measurement, Computer science, Speech recognition, Speech coding, Acoustic model, PSQM, Linear predictive coding, Speech processing, 01 natural sciences, Speech enhancement, 030507 speech-language pathology & audiology, 03 medical and health sciences, Signal-to-noise ratio, Codec2, 0103 physical sciences, 0305 other medical science, 010301 acoustics
Abstract: In this paper, we propose a multi-stage speech enhancement technique for speech recognition. At first, a multi-channel speech enhancement method takes advantage of the spatial information of speech source. Then, in the second stage, single-channel speech enhancement based on data-driven approach is adopted to improve performance of speech recognition at server side. This method can improve the quality of speech signal which maximizes the advantage of each speech enhancement technique. The experimental result shows that the proposed technique is superior to conventional multi-stage speech enhancement algorithms.
Published: 2016
Full Text: View/download PDF

31. A MFCC-Based CELP Speech Coder for Server-Based Speech Recognition in Network Environments

Author: Gil Ho Lee, Hong Kook Kim, and Jae Sam Yoon
Subjects: Code-excited linear prediction, Voice activity detection, Computer science, Applied Mathematics, Speech recognition, Speech coding, Speech technology, Word error rate, Acoustic model, PSQM, Speaker recognition, Speech processing, Linear predictive coding, Computer Graphics and Computer-Aided Design, Codec2, Signal Processing, Mel-frequency cepstrum, Electrical and Electronic Engineering, Vector sum excited linear prediction
Abstract: Existing standard speech coders can provide high quality speech communication. However, they tend to degrade the performance of automatic speech recognition (ASR) systems that use the reconstructed speech. The main cause of the degradation is in that the linear predictive coefficients (LPCs), which are typical spectral envelope parameters in speech coding, are optimized to speech quality rather than to the performance of speech recognition. In this paper, we propose a speech coder using mel-frequency cepstral coefficients (MFCCs) instead of LPCs to improve the performance of a server-based speech recognition system in network environments. To develop the proposed speech coder with a low-bit rate, we first explore the interframe correlation of MFCCs, which results in the predictive quantization of MFCC. Second, a safety-net scheme is proposed to make the MFCC-based speech coder robust to channel errors. As a result, we propose an 8.7 kbps MFCC-based CELP coder. It is shown that the proposed speech coder has a comparable speech quality to 8 kbps G.729 and the ASR system using the proposed speech coder gives the relative word error rate reduction by 6.8% as compared to the ASR system using G.729 on a large vocabulary task (AURORA4).
Published: 2007
Full Text: View/download PDF

32. Speech Compression by Polynomial Approximation

Author: James L. Flanagan, A. Karve, Sorin V. Dusan, and Mridul Balaraman
Subjects: Voice activity detection, Acoustics and Ultrasonics, Codec2, Computer science, Speech recognition, Speech coding, Data_CODINGANDINFORMATIONTHEORY, PSQM, Electrical and Electronic Engineering, Intelligibility (communication), Linear predictive coding, Speech processing, Data compression
Abstract: Methods for speech compression aim at reducing the transmission bit rate while preserving the quality and intelligibility of speech. These objectives are antipodal in nature since higher compression presupposes preserving less information about the original speech signal. This paper presents a method for compressing speech based on polynomial approximations of the trajectories in time of various speech features (i.e., spectrum, gain, and pitch). The compression method can be integrated into frame-based speech coders, and can also be applied to features that can be represented as temporal series greater in duration than the frame interval. Theoretical issues and experimental results regarding this type of compression are addressed in this paper. Experimental implementation into a 2400 b/s standard speech coder is reported along with objective and subjective evaluations of operation in various noise environments. The new speech coder operates at a transmission rate of 1533 b/s, and for all noisy conditions tested performs better than the 2400 b/s standard speech coder
Published: 2007
Full Text: View/download PDF

33. Speech Coding Techniques and ITU-T Standards for Telecommunication

Author: Shigeaki Sasaki and Hitoshi Ohmuro
Subjects: Multi-Band Excitation, Voice activity detection, Codec2, Adaptive Multi-Rate audio codec, Computer science, Speech recognition, Speech coding, Speech technology, PSQM, Electrical and Electronic Engineering
Published: 2007
Full Text: View/download PDF

34. A comfort noise addition post-processor for enhancing low bit-rate speech coding in noisy environments

Author: Martin Dietz, Ravelli Emmanuel, Guillaume Fuchs, and Anthony Lombard
Subjects: Speech enhancement, Background noise, Voice activity detection, Codec2, Computer science, Speech recognition, Speech coding, PSQM, Speech processing, Linear predictive coding
Abstract: At low bit-rates, speech coders relying on a single source model have difficulties to properly render speech and background noise simultaneously. Difficulties can get even bigger when using speech enhancement techniques within the coding scheme: these have shown to improve quality for clean speech, but introduce unpleasant instabilities under noisy conditions. This paper presents a novel approach named Comfort Noise Addition (CNA) for post-processing noisy speech coding. It continuously injects comfort noise in both active and inactive segments of the decoded speech. Based on an accurate noise estimate, it enhances the reproduction of the background noise while masking artefacts from the coding and the speech enhancement. Listening tests confirm that CNA allows speech coders to compensate for the limitations of speech enhancement and to significantly improve the quality of noisy speech at low bit-rates. CNA was adopted in the recent 3GPP codec for Enhanced Voice Services (EVS).
Published: 2015
Full Text: View/download PDF

35. Audio bandwidth detection in the EVS codec

Author: Wolfgang Jaegers, Vaclav Eksler, and Milan Jelinek
Subjects: Adaptive Multi-Rate audio codec, Codec2, Computer science, Speech recognition, Bandwidth (signal processing), Speech coding, Bandwidth extension, Codec, Data_CODINGANDINFORMATIONTHEORY, PSQM, Enhanced Variable Rate Codec
Abstract: Speech and audio codecs are usually designed such that they encode all the frequency bands of the input signal spectrum. If the higher bands do not contain any perceptually meaningful content, these codecs often do not work optimally as they assign part of the available bit budget to encode these bands. In this paper we describe a bandwidth detection algorithm that determines the effective audio bandwidth of the input signal. This information is used to set the codec to its optimal configuration and consequently increase the coding efficiency for band-limited signals by allocating bits to encode only the useful bandwidth. The presented algorithm has been used in the new codec for Enhanced Voice Services (EVS), recently standardized by 3GPP, but it can be employed in other codecs as well.
Published: 2015
Full Text: View/download PDF

36. Robust speech coding with EVS

Author: Henri Toukomaa, Anssi Rämö, and Adriana Vasilache
Subjects: Voice activity detection, Adaptive Multi-Rate audio codec, Codec2, Computer science, Speech recognition, Mean opinion score, Speech coding, Acoustic model, PSQM, Linear predictive coding
Abstract: This paper discusses the voice and audio quality characteristics of EVS, the recently standardized 3GPP codec. Especially frame erasure conditions were evaluated. Comparison to industry standard voice codecs: 3GPP AMR and AMR-WB as well as direct signals at varying bandwidths was made. Speech quality was evaluated with two subjective listening tests containing clean and noisy speech in Finnish language. Five different random frame erasure rates were evaluated: 0 %, 3 %, 6 %, 10 % and 15 %. Nine-scale subjective mean opinion score was calculated for all tested conditions.
Published: 2015
Full Text: View/download PDF

37. On a robust ASR based on complex AR speech analysis

Author: Keiichi Funaki and Keita Higa
Subjects: Speech enhancement, Voice activity detection, Codec2, Computer science, Speech recognition, Speech coding, Acoustic model, PSQM, Linear predictive coding, Speech processing
Abstract: The advanced front-end (AFE) for automatic speech recognition (ASR) was standardized by the European Telecommunications Standards Institute (ETSI). The AFE provides speech enhancement realized by an iterative Wiener filter (IWF) in which a smoothed FFT spectrum over adjacent frames is used to design the filter. We have previously proposed robust time-varying complex AR (TV-CAR) speech analysis and evaluated the performance of speech processing such as F0 estimation and speech enhancement. TV-CAR analysis can estimate more accurate spectrum than FFT, especially in low frequencies because of the nature of the analytic signal. In addition, the TV-CAR can estimate more accurate speech spectrum against additive noise. In this paper, the time-invariant version of wide-band TV-CAR analysis is introduced to the IWF in the AFE and is evaluated using the CENSREC-2 database.
Published: 2015
Full Text: View/download PDF

38. An Imperceptible Information Hiding in Encoded Bits of Speech Signal

Author: Rong-San Lin
Subjects: Voice activity detection, Codec2, Computer science, Speech recognition, Information hiding, Speech coding, PSQM, Full Rate, Linear predictive coding, PESQ
Abstract: This paper presents an imperceptible information hiding approach applied to the G.723.1 low-bit-rate codec, which is already being used extensively in Voice-over-Internet Protocol. We show that the encoded bits of the stochastic excitation pulse parameters are more suitable for data embedding than the encoded bits of other speech parameters. We also propose a voice-activity detection method that uses the residual signal energy of the speech signal to increase the data embedding capacity. The results of an imperceptibility evaluation indicate that the average perceptual evaluation of speech quality(PESQ) score is degraded only slightly, by 0.3, relative to that of cover speech that is not hidden data. Experimental results show that our proposed hiding algorithm not only achieves perfect imperceptibility but also produces a high data-embedding capacity, on average 583 bits per second for 5.3 kbit/s coding.
Published: 2015
Full Text: View/download PDF

39. Automatic emotion recognition in compressed speech using acoustic and non-linear features

Author: Julián D. Arias-Londoño, N. Garcia, Juan Rafael Orozco-Arroyave, Jesús Francisco Vargas-Bonilla, and Juan Camilo Vásquez-Correa
Subjects: Voice activity detection, Computer science, business.industry, Speech recognition, Speech coding, Acoustic model, Pattern recognition, PSQM, Speaker recognition, Speech processing, Linear predictive coding, Codec2, Artificial intelligence, business
Abstract: Automatic recognition of emotions in speech has attracted the attention of the research community in recent years. Some of the most relevant proposed applications of it are in call-centers. In these scenarios the speech is distorted by compression algorithms. The effects of such distortion on the performance of systems for automatic recognition of emotions must be assessed. In this study these effects are evaluated independently of any other distortions generated by the communications channel. Several state-of-the-art codecs are used to compress the speech signals of two emotional speech databases. The databases used are the Berlin Database of Emotional Speech and the enterface05. The methodology considers voiced and unvoiced segments of the speech separately. Spectral, cepstral, noise and Non-Linear Dynamics (NLD) measures are used to characterize the segments. Finally, a classifier based on a Gaussian Mixture Model (GMM) is used to identify the emotion. The results indicate that voiced segments are less affected by the compression than unvoiced ones in terms in classification accuracy. They also show that the bandwidth of the analyzed signals is an important factor in the classification results.
Published: 2015
Full Text: View/download PDF

40. Advances in speech and audio processing and coding

Author: Andreas Spanias
Subjects: Voice activity detection, Adaptive Multi-Rate audio codec, Codec2, Computer science, Speech recognition, Speech coding, Extended Adaptive Multi-Rate – Wideband, PSQM, Speech processing, Linear predictive coding
Abstract: This plenary session will cover speech processing research advances with the emphasis on speech and audio coding methods. In the session, we will discuss the fundamental principles, techniques, and algorithms used in current coding applications including a summary of codecs for telecommunication standards. The session will start with a discussion on: the basic speech representation methods, the performance measures used to evaluate coded speech, and the role of the standards. Brief algorithm descriptions include: ADPCM, sub-band coding, adaptive transform coding, sinusoidal transform coding (STC), linear predictive coding (LPC), and analysis-by-synthesis LPC (sparse excitation, code excited LPC, and ACELP). The presentation will feature audio, and computer demonstrations of recent speech coding standards including voice-over IP algorithms. The plenary session will also cover wideband audio standards such as MPEG audio and other layers (e.g., MP3, AAC). Recent algorithms will also be described including the following: Variable-Rate Multimode Wideband (VMR-WB), Speex, G722.1, OGG Vorbis 2012, iLBC, SELT, SILK, Opus 2013, Qualcomm wideband 5G codecs. At the end of the session, we will cover briefly recent applications that use voice features for detecting speech pathologies, and also discuss how long-term speech parameters can be used as predictors of other diseases such as tremors, Alzheimer's etc.
Published: 2015
Full Text: View/download PDF

41. Adaptive Speech Streaming Based on Speech Quality Estimation and Artificial Bandwidth Extension for Voice over Wireless Multimedia Sensor Networks

Author: Seong Ro Lee, Nam In Park, Jin Ah Kang, and Hong Kook Kim
Subjects: Voice activity detection, Article Subject, Computer Networks and Communications, Computer science, Speech recognition, Speech coding, General Engineering, Bandwidth extension, PSQM, Full Rate, Linear predictive coding, Speech processing, lcsh:QA75.5-76.95, Packet loss concealment, Codec2, Bit rate, lcsh:Electronic computers. Computer science
Abstract: In this paper, an adaptive speech streaming method is proposed to improve the perceived speech quality (PSQ) of voice over wireless multimedia sensor network (WMSNs). First of all, the proposed method estimates the PSQ of the received speech data under different network conditions that are represented by the packet loss rates (PLRs). Simultaneously, the proposed method classifies the speech signal as either an onset or a nononset frame. Based on the estimated PSQ and the speech class, it determines an appropriate bit rate for the redundant speech data (RSD) that are transmitted with the primary speech data (PSD) to help reconstruct the speech signals of any lost frames. In particular, when the estimated PLR is high, the bit rate of the RSD should be increased by decreasing that of the PSD. Thus, the bandwidth of the PSD is changed from wideband to narrowband, and an artificial bandwidth extension technique is applied to the decoded narrowband speech. It is shown from the simulation that the proposed method significantly improves the decoded speech quality under packet loss conditions in a WMSN, compared to a decoder-based packet loss concealment method and a conventional redundant speech transmission method.
Published: 2015

42. Subjective quality evaluation of the 3GPP EVS codec

Author: Henri Toukomaa and Anssi Rämö
Subjects: Voice activity detection, Noise measurement, Computer science, Mean opinion score, Speech recognition, Speech coding, Acoustic model, PSQM, Opus, Full Rate, Enhanced Variable Rate Codec, Linear predictive coding, Wideband audio, Codec2, Adaptive Multi-Rate audio codec, Bit rate, Extended Adaptive Multi-Rate – Wideband, Codec, Active listening, Sound quality
Abstract: This paper discusses the voice and audio quality characteristics of EVS, the recently standardized 3GPP codec. Comparison to Opus, IETF driven open source codec as well as industry standard voice codecs: 3GPP AMR and AMR-WB, and ITU-T G.718B, G.722.1C and G.719 as well as direct signals at varying bandwidths was made. Voice and audio quality was evaluated with three subjective listening tests containing clean and noisy speech in Finnish language as well as a mixed condition test containing both speech and music intermixed. Nine-scale subjective mean opinion score was calculated for all tested conditions.
Published: 2015
Full Text: View/download PDF

43. Two-stage speech/music classifier with decision smoothing and sharpening in the EVS codec

Author: Tommy Vaillancourt, Vladimir Malenovsky, Ki-hyun Choo, Wang Zhe, and Atti Venkatraman S
Subjects: Voice activity detection, Computer science, business.industry, Speech recognition, Speech coding, Pattern recognition, PSQM, Sharpening, Linear predictive coding, ComputingMethodologies_PATTERNRECOGNITION, Codec2, Codec, Artificial intelligence, business, Smoothing
Abstract: In most internationally recognized standardized multi-mode codecs, signal classification is performed in a single step by either linear discrimination or SNR-based metrics. The speech/music classifier of the EVS codec achieves greater discrimination than these single-step models by combining Gaussian mixture modelling (GMM) with a series of context-based improvement layers. Additionally, unlike traditional GMM classifiers the EVS model adopts a short hangover period, allowing it to track transitions between music and speech. Misclassifications are mitigated by applying a novel decision smoothing and sharpening technique. The results in relatively static environments demonstrate that the new two-stage approach with selective hangover leads to classification accuracies comparable to speech/music classifiers with longer hangovers. They also show that the new approach leads to faster and more accurate switching of coding modes than conventional classifiers for more complex audio environments such as advertisements, jingles and speech superimposed on music.
Published: 2015
Full Text: View/download PDF

44. Efficient handling of mode switching and speech transitions in the EVS codec

Author: Redwan Salami, Milan Jelinek, and Vaclav Eksler
Subjects: Code-excited linear prediction, Voice activity detection, Computer science, Speech recognition, Speech coding, Algebraic code-excited linear prediction, Linear prediction, PSQM, Full Rate, Linear predictive coding, Adaptive Multi-Rate audio codec, Codec2, Extended Adaptive Multi-Rate – Wideband, Electronic engineering, Codec, Vector sum excited linear prediction
Abstract: The recently standardized codec for Enhanced Voice Services (EVS) consists of a number of modes to achieve its high coding flexibility. In this paper we focus on techniques that enable a seamless switching between two linear prediction based modes running at different sampling rates within this codec. The first one deals with an efficient conversion of the linear prediction filter coefficients. The other one is based on a constrained-memory ACELP called transition coding (TC) that significantly limits the inter-frame long-term dependency. We show that the use of TC can be successfully extended to improve quality also in coding other transitions, e.g. strong onsets of voiced speech.
Published: 2015
Full Text: View/download PDF

45. New post-processing techniques for low bit rate celp codecs

Author: Redwan Salami, Milan Jelinek, and Tommy Vaillancourt
Subjects: Code-excited linear prediction, Voice activity detection, Excitation signal, Audio signal, Noise measurement, Computer science, Speech recognition, Speech coding, Algebraic code-excited linear prediction, Data_CODINGANDINFORMATIONTHEORY, Full Rate, PSQM, Linear predictive coding, Background noise, Adaptive Multi-Rate audio codec, Codec2, Bit rate, Extended Adaptive Multi-Rate – Wideband, Discrete cosine transform, Codec, Harmonic Vector Excitation Coding
Abstract: This paper presents two new post-processing techniques to address limitations of the deployed low bit rate speech codecs in case of unvoiced speech and background noise, and in case of music. Both post-processing techniques enhance the spectrum of the decoded excitation signal without increasing the codec algorithmic delay. The paper discusses how to integrate the enhancement procedure of unvoiced speech and background noise and of generic audio signals coded by low bit rate ACELP codecs. The proposed post-processing procedures are part of the AMR-WB interoperable modes of the recently standardized 3GPP EVS codec [1].
Published: 2015
Full Text: View/download PDF

46. Super-wideband bandwidth extension for speech in the 3GPP EVS codec

Author: Duminda A. Dewasurendra, Daniel J. Sinder, Volodya Grancharov, Venkata Subrahmanyam Chandra Sekhar Chebiyyam, Subasingha Shaminda Subasingha, Harald Pobloth, Atti Venkatraman S, Jon Gibbs, Vivek Rajendran, Imre Varga, Lei Miao, and Venkatesh Krishnan
Subjects: Background noise, Codec2, Adaptive Multi-Rate audio codec, Computer science, Mean opinion score, Speech recognition, Bit rate, Speech coding, Bandwidth extension, Codec, Enhanced Variable Rate Codec, PSQM, Harmonic Vector Excitation Coding
Abstract: This paper describes the time-domain bandwidth extension (TBE) framework employed to code wideband and super-wideband speech in the newly standardized 3GPP EVS codec. The TBE algorithm uses a nonlinear harmonic modeling technique that incorporates principles of time-domain envelope-modulated noise mixing. At 13.2 kbps, the super-wideband coding of speech uses as low as 1.55 kbps for encoding the spectral content from 6.4–14.4 kHz. Subjective evaluation results from ITU-T P.800 Mean Opinion Score (MOS) tests are provided, showing significantly improved quality compared to the other standardized SWB codecs under both clean speech and speech with background noise.
Published: 2015
Full Text: View/download PDF

47. Low-complexity and robust coding mode decision in the EVS coder

Author: Ravelli Emmanuel, Guillaume Fuchs, Markus Multrus, and Christian Helmrich
Subjects: Voice activity detection, Computer science, Speech recognition, Speech coding, PSQM, Coding tree unit, Sub-band coding, Signal-to-noise ratio, Adaptive Multi-Rate audio codec, Codec2, Distortion, Codec, Algorithm, Vector sum excited linear prediction, Decoding methods, Harmonic Vector Excitation Coding
Abstract: Several state-of-the-art switched audio codecs employ the closed-loop mode decision to select the best coding mode at every frame. The closed-loop mode selection is known to have good performance but also high complexity. The new approach we propose in this paper is a low-complexity version of the closed-loop approach, based on similar decisions which compute the coding distortion of each mode and select the one with the lowest distortion. Our approach differs mainly in the way the coding distortions are calculated. We are able to notably reduce the complexity by only estimating the distortions without encoding and decoding the input for each mode. The new approach was implemented in the EVS codec standard and evaluated both objectively and subjectively. Compared to the closed-loop approach, it yields similar performance and lower complexity.
Published: 2015
Full Text: View/download PDF

48. Overview of the EVS codec architecture

Author: Erik Norvell, Vladimir Malenovsky, Hiroyuki Ehara, Adriana Vasilache, Ho-Sang Sung, Hao Yuan, Markus Multrus, Vivek Rajendran, Lei Miao, Zhe Wang, Yutaka Kamamoto, Lasse Juhani Laaksonen, Martin Dietz, Atti Venkatraman S, Vaclav Eksler, Eunmi Oh, Changbao Zhu, Kei Kikuiri, Julien Faure, Stéphane Ragot, and Harald Pobloth
Subjects: Voice activity detection, Computer science, Real-time computing, Speech coding, Data_CODINGANDINFORMATIONTHEORY, PSQM, Enhanced Variable Rate Codec, Full Rate, Half Rate, Codec2, Adaptive Multi-Rate audio codec, Computer architecture, Audio codec, Bit rate, Extended Adaptive Multi-Rate – Wideband, Codec, Sound quality
Abstract: The recently standardized 3GPP codec for Enhanced Voice Services (EVS) offers new features and improvements for low-delay real-time communication systems. Based on a novel, switched low-delay speech/audio codec, the EVS codec contains various tools for better compression efficiency and higher quality for clean/noisy speech, mixed content and music, including support for wideband, super-wideband and full-band content. The EVS codec operates in a broad range of bitrates, is highly robust against packet loss and provides an AMR-WB interoperable mode for compatibility with existing systems. This paper gives an overview of the underlying architecture as well as the novel technologies in the EVS codec and presents listening test results showing the performance of the new codec in terms of compression and speech/audio quality.
Published: 2015
Full Text: View/download PDF

49. Arithmetic coding of speech and audio spectra using tcx based on linear predictive spectral envelopes

Author: Christian Helmrich and Tom Bäckström
Subjects: Computer science, Speech recognition, Tunstall coding, Speech coding, Linear prediction, Shannon–Fano coding, Codec2, Computer Science::Multimedia, Codec, Entropy encoding, Transform coding, Code-excited linear prediction, Voice activity detection, Variable-length code, PSQM, Linear predictive coding, Coding tree unit, Arithmetic coding, Sub-band coding, Adaptive Multi-Rate audio codec, Computer Science::Sound, Frequency domain, Vector sum excited linear prediction, Context-adaptive binary arithmetic coding, Harmonic Vector Excitation Coding, Context-adaptive variable-length coding
Abstract: Unified speech and audio codecs often use a frequency domain coding technique of the transform coded excitation (TCX) type. It is based on modeling the speech source with a linear predictor, spectral weighting by a perceptual model and entropy coding of the frequency components. While previous approaches have used neighbouring frequency components to form a probability model for the entropy coder of spectral components, we propose to use the magnitude of the linear predictor to estimate the variance of spectral components. Since the linear predictor is transmitted in any case, this method does not require any additional side info. Subjective measurements show that the proposed methods give a statistically significant improvement in perceptual quality when the bit-rate is held constant. Consequently, the proposed method has been adopted to the 3GPP Enhanced Voice Services speech coding standard.
Published: 2015
Full Text: View/download PDF

50. Advances in low bitrate time-frequency coding

Author: Tommy Vaillancourt, Milan Jelinek, Jon Gibbs, Redwan Salami, Lei Miao, Zexin Liu, and Vladimir Malenovsky
Subjects: Code-excited linear prediction, Voice activity detection, Audio signal, Computer science, Speech recognition, Speech coding, Algebraic code-excited linear prediction, Full Rate, PSQM, Linear predictive coding, Coding tree unit, Sub-band coding, Codec2, Adaptive Multi-Rate audio codec, Frequency domain, Bit rate, Extended Adaptive Multi-Rate – Wideband, Codec, Time domain, Vector sum excited linear prediction, Harmonic Vector Excitation Coding
Abstract: In this paper a novel technique is presented to efficiently mix traditional ACELP time domain coding with a frequency domain coding model to improve the quality of generic audio signals coded at low bitrates without additional delay. The paper discusses how to integrate parts of a traditional Algebraic Code Excited Linear Prediction (ACELP) speech codec to create a time-domain contribution which coexists with a frequency based coding model. A mechanism to determine the value of the time-domain contribution is proposed and a method is described how the frequency-domain contribution might be added without increasing the overall delay of the codec. The proposed method forms part of the recently standardised 3GPP EVS codec.
Published: 2015
Full Text: View/download PDF

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Journal

Database

Publisher

291 results on '"PSQM"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources