315 results on '"John Makhoul"'
Search Results
152. Mechanical Inference Problems in Continuous Speech Understanding.
- Author
-
William A. Woods and John Makhoul
- Published
- 1974
- Full Text
- View/download PDF
153. Speaker Adaptation in a Limited Speech Recognition System.
- Author
-
John Makhoul
- Published
- 1971
- Full Text
- View/download PDF
154. Normalization of phonetic keyword search scores
- Author
-
Richard Schwartz, Stavros Tsakalidis, Ivan Bulyko, Damianos Karakos, Long Nguyen, and John Makhoul
- Subjects
Computer science ,business.industry ,Keyword search ,Speech recognition ,Artificial intelligence ,business ,computer.software_genre ,computer ,Natural language processing - Published
- 2014
- Full Text
- View/download PDF
155. The 2013 BBN Vietnamese telephone speech keyword spotting system
- Author
-
Long Nguyen, Guruprasad Saikumar, Le Zhang, Damianos Karakos, Tim Ng, Shivesh Ranjan, Roger Hsiao, John Makhoul, Stavros Tsakalidis, and Richard Schwartz
- Subjects
business.industry ,Computer science ,Speech recognition ,Vietnamese ,Keyword spotting ,language ,NIST ,Artificial intelligence ,business ,computer.software_genre ,computer ,Natural language processing ,language.human_language - Abstract
In this paper we describe the Vietnamese conversational telephone speech keyword spotting system under the IARPA Babel program for the 2013 evaluation conducted by NIST. The system contains several, recently developed, novel methods that significantly improve speech-to-text and keyword spotting performance such as stacked bottleneck neural network features, white listing, score normalization, and improvements on semi-supervised training methods. These methods resulted in the highest performance for the official IARPA Babel surprise language evaluation of 2013.
- Published
- 2014
- Full Text
- View/download PDF
156. MULTILINGUAL MACHINE PRINTED OCR
- Author
-
Zhidong Lu, Issam Bazzi, John Makhoul, Richard Schwartz, and Premkumar Natajan
- Subjects
Computer science ,business.industry ,Speech recognition ,Feature extraction ,Facsimile ,Image segmentation ,Optical character recognition ,Markov model ,computer.software_genre ,Constructed language ,Artificial Intelligence ,Robustness (computer science) ,ComputingMethodologies_DOCUMENTANDTEXTPROCESSING ,Computer Vision and Pattern Recognition ,Artificial intelligence ,Hidden Markov model ,business ,computer ,Software - Abstract
This paper presents a script-independent methodology for optical character recognition (OCR) based on the use of hidden Markov models (HMM). The feature extraction, training and recognition components of the system are all designed to be script independent. The training and recognition components were taken without modification from a continuous speech recognition system; the only component that is specific to OCR is the feature extraction component. To port the system to a new language, all that is needed is text image training data from the new language, along with ground truth which gives the identity of the sequences of characters along each line of each text image, without specifying the location of the characters on the image. The parameters of the character HMMs are estimated automatically from the training data, without the need for laborious handwritten rules. The system does not require presegmentation of the data, neither at the word level nor at the character level. Thus, the system is able to handle languages with connected characters in a straightforward manner. The script independence of the system is demonstrated in three languages with different types of script: Arabic, English, and Chinese. The robustness of the system is further demonstrated by testing the system on fax data. An unsupervised adaptation method is then described to improve performance under degraded conditions.
- Published
- 2001
- Full Text
- View/download PDF
157. A SCRIPT-INDEPENDENT METHODOLOGY FOR OPTICAL CHARACTER RECOGNITION
- Author
-
Christopher LaPre, Richard Schwartz, John Makhoul, and Issam Bazzi
- Subjects
Character (computing) ,Arabic ,Computer science ,Speech recognition ,Feature extraction ,Image processing ,Optical character recognition ,Markov model ,computer.software_genre ,language.human_language ,Intelligent word recognition ,ComputingMethodologies_PATTERNRECOGNITION ,Artificial Intelligence ,Signal Processing ,Pattern recognition (psychology) ,ComputingMethodologies_DOCUMENTANDTEXTPROCESSING ,language ,Segmentation ,Computer Vision and Pattern Recognition ,Hidden Markov model ,computer ,Software - Abstract
We present a methodology for OCR that exhibits the following properties: script-independent feature extraction, training, and recognition components; no separate segmentation at the character and word levels; and the training is performed automatically on data that is also not presegmented. The methodology is adapted to OCR from continuous speech recognition, which has developed a mature and successful technology based on Hidden Markov Models. The script independence of the methodology is demonstrated using omnifont experiments on the DARPA Arabic OCR Corpus and the University of Washington English Document Image Database I.
- Published
- 1998
- Full Text
- View/download PDF
158. Towards Automatically Building Tutor Models Using Multiple Behavior Demonstrations
- Author
-
R. Bruce Roberts, Rohit Kumar, Matthew E. Roy, and John Makhoul
- Subjects
Multimedia ,Computer science ,Process (engineering) ,business.industry ,Tracing ,computer.software_genre ,Automation ,Human–computer interaction ,Scalability ,ComputingMilieux_COMPUTERSANDEDUCATION ,TUTOR ,business ,computer ,computer.programming_language - Abstract
Automation of tutor modeling can contribute to scalable development and maintenance of Intelligent Tutoring Systems (ITS). In this paper, we are proposing a modification to the process used to build Example Tracing tutors which are a widely used tutor model. Our approach automatically uses behavior demonstrations by multiple non-experts (such as learners) to create a partially annotated generalized tutor model.
- Published
- 2014
- Full Text
- View/download PDF
159. Score normalization and system combination for improved keyword spotting
- Author
-
Mirko Hannemann, Martin Karafiat, Viet Bac Le, Tim Ng, Lori Lamel, Igor Szöke, Ivan Bulyko, Richard Schwartz, Guruprasad Saikumar, Shivesh Ranjan, Damianos Karakos, Long Nguyen, Karel Vesely, Le Zhang, Frantisek Grezl, John Makhoul, Roger Hsiao, and Stavros Tsakalidis
- Subjects
Normalization (statistics) ,System combination ,Computer science ,business.industry ,Speech recognition ,Keyword spotting ,Query formulation ,Pattern recognition ,Maximization ,Artificial intelligence ,business - Abstract
We present two techniques that are shown to yield improved Keyword Spotting (KWS) performance when using the ATWV/MTWV performance measures: (i) score normalization, where the scores of different keywords become commensurate with each other and they more closely correspond to the probability of being correct than raw posteriors; and (ii) system combination, where the detections of multiple systems are merged together, and their scores are interpolated with weights which are optimized using MTWV as the maximization criterion. Both score normalization and system combination approaches show that significant gains in ATWV/MTWV can be obtained, sometimes on the order of 8-10 points (absolute), in five different languages. A variant of these methods resulted in the highest performance for the official surprise language evaluation for the IARPA-funded Babel project in April 2013.
- Published
- 2013
- Full Text
- View/download PDF
160. Extensible Adaptive System for STEM Learning
- Author
-
John Makhoul
- Subjects
business.industry ,Computer science ,Adaptive system ,Stem learning ,Workbench ,System administration ,Software engineering ,business ,Extensibility ,Kickoff meeting - Abstract
This is the second Quarterly Progress Report (QPR) (CDRL Data Item #A001) for EAITS, an extensible and adaptive STEM Learning System that BBN is developing under the ONR STEM Grand Challenge Program, Contract # N00014-12-C-0535. It covers the period April 7, 2013 through July 6, 2013. During the period of this quarterly report our efforts focused on developing a prototype of the system workbench for content authoring and system administration. We also developed a briefing about our project that was presented at the ONR STEM Grand Challenge kickoff meeting on April 19, 2013. A revised version of this briefing was created and present to ONR on June 14, 2013.
- Published
- 2013
- Full Text
- View/download PDF
161. Spoken Language Systems.
- Author
-
John Makhoul
- Published
- 1989
162. Improved HMM Models for High Performance Speech Recognition.
- Author
-
Steve Austin, Chris Barry, Yen-Lu Chow, Man Derr, Owen Kimball, Francis Kubala, John Makhoul, Paul Placeway, William Russell, Richard M. Schwartz, and George Yu
- Published
- 1989
163. The BBN BYBLOS Continuous Speech Recognition System.
- Author
-
Richard M. Schwartz, Chris Barry, Yen-Lu Chow, Alan Deft, Ming-Whei Feng, Owen Kimball, Francis Kubala, John Makhoul, and Jeffrey Vandegrift
- Published
- 1989
164. White Paper on Spoken Language Systems.
- Author
-
John Makhoul, Frederick Jelinek, Lawrence R. Rabiner, Clifford J. Weinstein, and Victor Zue
- Published
- 1989
165. Speaker Adaptation from Limited Training in the BBN BYBLOS Speech Recognition System.
- Author
-
Francis Kubala, Ming-Whei Feng, John Makhoul, and Richard M. Schwartz
- Published
- 1989
166. Research in Continuous Speech Recognition.
- Author
-
John Makhoul and Richard M. Schwartz
- Published
- 1989
167. Automatic Detection Of New Words In A Large Vocabulary Continuous Speech Recognition System.
- Author
-
Ayman Asadi, Richard M. Schwartz, and John Makhoul
- Published
- 1989
168. State of the art in continuous speech recognition
- Author
-
Richard Schwartz and John Makhoul
- Subjects
Vocabulary ,Speech perception ,Workstation ,Intelligent character recognition ,Computer science ,media_common.quotation_subject ,Speech recognition ,Word error rate ,Models, Psychological ,law.invention ,Automation ,User-Computer Interface ,Phonation ,Search algorithm ,law ,Humans ,Speech ,Language ,media_common ,Multidisciplinary ,Computers ,business.industry ,Communication ,Linguistics ,Markov Chains ,Speech Perception ,Voice ,State (computer science) ,business ,Algorithms ,Research Article - Abstract
In the past decade, tremendous advances in the state of the art of automatic speech recognition by machine have taken place. A reduction in the word error rate by more than a factor of 5 and an increase in recognition speeds by several orders of magnitude (brought about by a combination of faster recognition search algorithms and more powerful computers), have combined to make high-accuracy, speaker-independent, continuous speech recognition for large vocabularies possible in real time, on off-the-shelf workstations, without the aid of special hardware. These advances promise to make speech recognition technology readily available to the general public. This paper focuses on the speech recognition advances made through better speech modeling techniques, chiefly through more accurate mathematical modeling of speech sounds.
- Published
- 1995
- Full Text
- View/download PDF
169. AGILE: Autonomous Global Integrated Language Exploitation
- Author
-
John Makhoul
- Subjects
Machine translation ,Computer science ,business.industry ,Foreign language ,computer.software_genre ,Serif ,Transcription (linguistics) ,Artificial intelligence ,Language translation ,business ,Software engineering ,Language industry ,computer ,Natural language ,Natural language processing ,Agile software development - Abstract
This is the final report for Year 3 of the GALE project, whose objective is to transcribe and translate foreign spoken and written languages into English and to distill the transcription into accurate information for use by our military. Below, we summarize the work performed by the BBN-led AGILE Team in Year 3. A more detailed description of the work performed can be found in the DARPA/IPTO Quarterly Status Reports for this project. The Appendix contains the accomplishments of three additional efforts: Serif Maturation, Broadcast Monitoring System One-Year Archive, Robust Automatic Transcription of Speech (RATS) and Serif Research.
- Published
- 2009
- Full Text
- View/download PDF
170. Recent progress in Arabic broadcast news transcription at BBN
- Author
-
Long Nguyen, John Makhoul, Bing Xiang, Mohamed Afify, and Sherif M. Abdou
- Subjects
Arabic ,Computer science ,Speech recognition ,language ,Transcription (software) ,language.human_language - Published
- 2005
- Full Text
- View/download PDF
171. The effects of speech recognition and punctuation on information extraction performance
- Author
-
Alex Baron, Ivan Bulyko, John Makhoul, Long Nguyen, Bing Xiang, Lance Ramshaw, Richard Schwartz, and David Stallard
- Subjects
Information extraction ,Computer science ,business.industry ,Speech recognition ,media_common.quotation_subject ,Artificial intelligence ,computer.software_genre ,business ,computer ,Punctuation ,Natural language processing ,media_common - Published
- 2005
- Full Text
- View/download PDF
172. The BBN RT04 English broadcast news transcription system
- Author
-
Mohamed Afify, Bing Xiang, Sherif M. Abdou, Richard Schwartz, Long Nguyen, John Makhoul, and Spyros Matsoukas
- Subjects
Training set ,Computer science ,business.industry ,Speech recognition ,Word error rate ,Lexicon ,computer.software_genre ,Discriminative model ,Artificial intelligence ,Language model ,Transcription (software) ,Cluster analysis ,business ,computer ,Natural language processing - Abstract
This paper describes the BBN English Broadcast News transcription system developed for the EARS Rich Transcription 2004 (RT04) evaluation. In comparison to the BBN RT03 system, we achieved around 22% relative reduction in word error rate for all EARS BN development test sets. The use of additional acoustic training data acquired through Light Supervision based on thousands of hours of found data made the biggest contribution to the improvement. Better audio segmentation, through the use of an online speaker clustering algorithm and chopping speaker turns into moderately long utterances, also contributed substantially to the improvement. Other contributions, even of modest size but adding up nicely, include using discriminative training for all acoustic models, using word duration as an additional knowledge source during N-best rescoring, and using updated lexicon and language models.
- Published
- 2005
- Full Text
- View/download PDF
173. Objective speech quality evaluation of narrowband LPC vocoders
- Author
-
W. Russell, R. Viswanathan, and John Makhoul
- Subjects
Narrowband ,Computer science ,Speech quality ,Speech recognition ,Speech synthesis ,Linear predictive coding ,computer.software_genre ,computer ,Utterance - Abstract
Several methods are presented for the objective speech quality evaluation of narrowband LPC vocoders, based on a framework that we proposed at the 1976 ICASSP conference. In each method, the error in short-term spectral behavior between vocoded speech and the original is computed once every 10 ms. These errors are appropriately weighted and averaged over an utterance to produce a single objective score. Several short-term error measures, and time-weighting and averaging techniques are investigated. We evaluate the objective methods by correlating the resulting objective scores with formal subjective speech quality judgments. High correlations obtained indicate the usefulness of these methods.
- Published
- 2005
- Full Text
- View/download PDF
174. Voice-excited LPC coders for 9.6 kbps speech transmission
- Author
-
W. Russell, John Makhoul, and R. Viswanathan
- Subjects
Codec2 ,Estimation theory ,Computer science ,Quantization (signal processing) ,Speech recognition ,Speech coding ,Baseband ,Speech synthesis ,Linear predictive coding ,computer.software_genre ,computer - Abstract
This paper considers the use of voice-excited linear predictive (LPC) coders for speech transmission at a bit-rate of 9.6 kbps. In our on-going work, we study in detail the various aspects of this class of speech coders, with the goal of maximizing the speech quality at the above rate. Important among these aspects are: baseband residual versus baseband speech transmission, coding of the baseband signal, and high-frequency regeneration from the baseband. We provide a discussion of these and other issues, and indicate a number of variables that have been included in our speech-quality optimization study. Experimental results obtained to date are summarized in the paper. More complete results and conclusions will be presented at the conference.
- Published
- 2005
- Full Text
- View/download PDF
175. Diphone synthesis for phonetic vocoding
- Author
-
D. Klatt, John Makhoul, Richard Schwartz, J. Klovstad, and Victor W. Zue
- Subjects
Sonorant ,Computer science ,Vowel ,Speech recognition ,String (computer science) ,Voice-onset time ,Context (language use) ,Speech synthesis ,computer.software_genre ,Linear predictive coding ,Diphone ,computer - Abstract
We report on the synthesis of speech in the context of a phonetic vocoder operating at 100 b/s. With each phoneme, the vocoder transmits the duration and a single pitch value. The synthesizer uses a large inventory of diphone "models" to synthesize a desired phoneme string. The diphone inventory has been selected to differentiate between prevocalic and postvocalic allophones of sonorants, to account for changes in vowel color conditioned by postvocalic liquids, to allow exact specification of voice onset time, and to permit synthesis of glottal stops alveolar flaps and syllabic consonants. The diphones are extracted from carefully constructed short utterances and are stored as a sequence of LPC parameters. During synthesis, the requisite diphone models are time-warped, abutted and smoothed to produce a complete sequence of LPC parameters that are used in the synthesis. The algorithms used are described and compared with more conventional methods. Examples of the synthesized speech will be played.
- Published
- 2005
- Full Text
- View/download PDF
176. A mixed-source model for speech compression and synthesis
- Author
-
John Makhoul, R. Viswanathan, A. W. F. Huggins, and Richard Schwartz
- Subjects
Speech enhancement ,Noise ,Codec2 ,Computer Science::Sound ,Computer science ,Speech recognition ,Speech coding ,Voice ,Speech synthesis ,computer.software_genre ,Speech processing ,computer - Abstract
This paper presents an excitation source model for speech compression and synthesis, which allows for a degree of voicing by mixing voiced (pulse) and unvoiced (noise) excitations in a frequency-selective manner. The mix is achieved by dividing the speech spectrum into two regions, with the pulse source exciting the low-frequency region and the noise source exciting the high-frequency region. A parameter F c determines the degree of voicing by specifying the cut-off frequency between the voiced and unvoiced regions. For speech compression applications, F c can be extracted automatically from the speech spectrum and transmitted. Experiments using the new model indicate its power in synthesizing natural sounding voiced fricatives, and in largely eliminating the "buzzy" quality of vocoded speech. A functional definition of buzziness and naturalness is given in terms of the model.
- Published
- 2005
- Full Text
- View/download PDF
177. Quality ratings of LPC vocoders: Effects of number of poles, quantization, and frame rate
- Author
-
A. W. F. Huggins, R. Viswanathan, and John Makhoul
- Subjects
symbols.namesake ,Computer science ,Speech recognition ,Quantization (signal processing) ,Bit rate ,symbols ,Linear predictive coding ,Frame rate ,Huffman coding ,Harmonic Vector Excitation Coding - Abstract
Four values for number of poles (13, 11, 9, 8) were combined factorially with three values of step size for quantization of log area ratios (0.5, 1, 2 dB), and with four values of frame rate (100, 67, 50, 33 per second), to define 48 LPC vocoder systems with overall bit rates ranging from 8.7 down to 1.3 kbps. Subjects rated the DEGRADATION of signal quality by each vocoder, for each of seven sentence tokens, chosen to challenge LPC vocoders maximally. The results define the combination of LPC parameters yielding the best speech quality for any desired overall bit rate.
- Published
- 2005
- Full Text
- View/download PDF
178. Methods for nonlinear spectral distortion of speech signals
- Author
-
John Makhoul
- Subjects
Signal processing ,ComputingMethodologies_PATTERNRECOGNITION ,Computer Science::Sound ,Computer science ,Nonlinear distortion ,Speech recognition ,Speech coding ,Cepstrum ,Computer Science::Computation and Language (Computational Linguistics and Natural Language and Speech Processing) ,Image warping ,Linear predictive coding ,Speech processing - Abstract
The spectral distortion of speech signals, without affecting the pitch or the speed of the signal, has met with some difficulty due to the need for pitch extraction. This paper presents a general analysis-synthesis scheme for the arbitrary spectral distortion of speech signals without the need for pitch extraction. Linear predictive warping, cepstral warping, and autocorrelation warping, are given as examples of the general scheme. Applications include the unscrambling of helium speech, spectral compression for the hard of hearing, bit rate reduction in speech compression systems, and efficiency of spectral representation for speech recognition systems.
- Published
- 2005
- Full Text
- View/download PDF
179. Narrowband LPC speech transmission over noisy channels
- Author
-
E. Blackman, W. Russell, John Makhoul, and R. Viswanathan
- Subjects
Voice activity detection ,Computer science ,Speech recognition ,Speech coding ,Speech synthesis ,computer.software_genre ,Linear predictive coding ,Speech processing ,Speech enhancement ,Channel capacity ,Narrowband ,Transmission (telecommunications) ,Header ,Synchronization (computer science) ,computer ,Communication channel - Abstract
Recently we described a variable-frame-rate LPC vocoder designed to transmit good quality speech over 2400 bps fixed-rate noisy channels with bit-error probabilities ranging up to 5% [3]. The basic idea was to lower the data rate by transmitting LPC parameters only when speech characteristics have changed sufficiently since the last transmission, and to employ the resulting bit-rate savings for protecting important transmission data against channel noise. This paper describes our continuing efforts which have concentrated on minimizing loss of synchronization between the receiver and the transmitter. In one approach, we emphasize heavy protection of header, and rapid resynchronization. Alternatively, we apply constraints which guarantee synchronization at a cost of some freedom in the selection of data for transmission. Results from the first approach are presented; results from both methods will be compared at the conference.
- Published
- 2005
- Full Text
- View/download PDF
180. Adaptive lattice methods for linear prediction
- Author
-
R. Viswanathan and John Makhoul
- Subjects
Adaptive filter ,symbols.namesake ,Sequential estimation ,Control theory ,Wiener filter ,symbols ,Kernel adaptive filter ,Linear predictive analysis ,Linear prediction ,Digital filter ,Orthogonalization ,Algorithm ,Mathematics - Abstract
A general method for adaptive updating of lattice coefficients in the linear predictive analysis of nonstationary signals is presented. The method is given as one of two sequential estimation methods, the other being a block sequential estimation method. The fast convergence of adaptive lattice algorithms is seen to be due to the orthogonalization and decoupling properties of the lattice. These properties are useful in adaptive Wiener filtering. As an application, a new fast start-up equalizer structure is presented. In addition, a one-multiplier form of the lattice is presented, which results in a reduction of computations.
- Published
- 2005
- Full Text
- View/download PDF
181. Time and frequency domain noise shaping in speech coding
- Author
-
M. Berouti, M. Krasner, and John Makhoul
- Subjects
Computer science ,Speech recognition ,Speech coding ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Noise figure ,Noise shaping ,symbols.namesake ,Signal-to-noise ratio ,Distortion ,Phase noise ,Effective input noise temperature ,Value noise ,Noise temperature ,Noise measurement ,Quantization (signal processing) ,Temporal noise ,White noise ,Noise floor ,Speech enhancement ,Gradient noise ,Noise ,Colors of noise ,Gaussian noise ,Frequency domain ,Bit rate ,symbols ,Harmonic Vector Excitation Coding - Abstract
We present a general framework for the coding of signals with a time-varying short-term spectrum. The total quantization distortion is minimized by a dynamic bit allocation in time and in frequency. The resulting distortion is then white noise with fixed power. Such noise, however, is not optimal perceptually when coding speech signals. Spectral and temporal noise shaping are then used to minimize the perception of noise in frequency and in time. Initial experiments at 16 kb/s demonstrate the validity of shaping the noise in time as well as in frequency.
- Published
- 2005
- Full Text
- View/download PDF
182. LPCW: An LPC vocoder with linear predictive spectral warping
- Author
-
John Makhoul and L. Cosell
- Subjects
Speech perception ,Computer Science::Sound ,Computer science ,Spectral envelope ,Speech recognition ,Selectable Mode Vocoder ,Autocorrelation ,Bandwidth (signal processing) ,Linear prediction ,Image warping ,Linear predictive coding - Abstract
In ordinary linear prediction the speech spectral envelope is modeled by an all-pole spectrum. The error criterion employed guarantees a uniform fit across the whole frequency range. However, we know from speech perception studies that low frequencies are more important than high frequencies for perception. Therefore, a minimally redundant model would strive to achieve a uniform perceptual fit across the spectrum, which means that it should be able to represent low frequencies more accurately than high frequencies. This is achieved in the LPCW vocoder: an LPC vocoder employing our recently developed method of linear predictive warping (LPW). The result is improved speech quality for the same bit rate.
- Published
- 2005
- Full Text
- View/download PDF
183. Enhancement of speech corrupted by acoustic noise
- Author
-
Richard Schwartz, John Makhoul, and M. Berouti
- Subjects
Computer science ,Speech recognition ,Noise reduction ,Intelligibility (communication) ,Noise figure ,Noise (electronics) ,Background noise ,symbols.namesake ,Signal-to-noise ratio ,Noise generator ,Phase noise ,Waveform ,Effective input noise temperature ,Value noise ,Noise temperature ,Noise measurement ,Noise spectral density ,Spectral density ,Salt-and-pepper noise ,Noise floor ,Speech enhancement ,Gradient noise ,Burst noise ,Noise ,Computer Science::Sound ,Colors of noise ,Gaussian noise ,symbols ,Noise (radio) - Abstract
This paper describes a method for enhancing speech corrupted by broadband noise. The method is based on the spectral noise subtraction method. The original method entails subtracting an estimate of the noise power spectrum from the speech power spectrum, setting negative differences to zero, recombining the new power spectrum with the original phase, and then reconstructing the time waveform. While this method reduces the broadband noise, it also usually introduces an annoying "musical noise". We have devised a method that eliminates this "musical noise" while further reducing the background noise. The method consists in subtracting an overestimate of the noise power spectrum, and preventing the resultant spectral components from going below a preset minimum level (spectral floor). The method can automatically adapt to a wide range of signal-to-noise ratios, as long as a reasonable estimate of the noise spectrum can be obtained. Extensive listening tests were performed to determine the quality and intelligibility of speech enhanced by our method. Listeners unanimously preferred the quality of the processed speech. Also, for an input signal-to-noise ratio of 5 dB, there was no loss of intelligibility associated with the enhancement technique.
- Published
- 2005
- Full Text
- View/download PDF
184. A preliminary design of a phonetic vocoder based on a diphone model
- Author
-
J. Sorensen, J. Klovstad, John Makhoul, and Richard Schwartz
- Subjects
MBROLA ,Computer science ,Speech recognition ,String (computer science) ,Speech synthesis ,computer.software_genre ,Diphone ,computer - Abstract
We report on the initial development of a phonetic vocoder operating at 100 b/s. With each phoneme, the vocoder transmits the duration and a single pitch value. The synthesizer uses a large inventory of diphone templates to synthesize a desired phoneme string. To determine a phoneme string from input speech, the analyzer takes into account the synthesis model by using the same inventory of diphone templates, augmented by additional diphone templates to account for alternate pronunciations. The phoneme string is chosen to minimize the difference between the diphone templates and the input speech according to a distance measure.
- Published
- 2005
- Full Text
- View/download PDF
185. High-frequency regeneration in speech coding systems
- Author
-
John Makhoul and M. Berouti
- Subjects
Excitation signal ,Computer science ,Speech recognition ,Speech coding ,Speech synthesis ,Linear predictive coding ,computer.software_genre ,Noise ,Band-pass filter ,Frequency domain ,Baseband ,computer ,Algorithm ,Harmonic Vector Excitation Coding ,Transform coding ,Energy (signal processing) - Abstract
The traditional method of high-frequency regeneration (HFR) of the excitation signal in baseband coders has been to rectify the transmitted baseband, followed by spectral flattening. In addition, a noise source is added at high frequencies to compensate for lack of energy during certain sounds. In this paper, we reexamine the whole HFR process. We show that the degree of rectification does not affect the output speech, and that, with proper processing, the high-frequency noise source may be eliminated. We introduce a new type of HFR based on spectral duplication of the baseband. Two types of spectral duplication are presented: spectral folding and spectral translation. Finally, in order to eliminate the problem of breaking the harmonic structure due to spectral duplication, we propose a pitch-adaptive spectral duplication scheme in the frequency domain by using adaptive transform coding to code the baseband.
- Published
- 2005
- Full Text
- View/download PDF
186. Speech-quality optimization of 16 kb/s adaptive predictive coders
- Author
-
W. Russell, R. Viswanathan, M. Berouti, A. Higgins, and John Makhoul
- Subjects
Adaptive coding ,Control theory ,Quantization (signal processing) ,Speech coding ,Adaptive predictive coding ,Feedback loop ,Harmonic Vector Excitation Coding ,Noise shaping ,Mathematics ,Loop gain - Abstract
This paper considers the optimization of speech quality of adaptive predictive coding (APC) systems for transmission over a synchronous 16 kb/s channel. Among the important issues included in this on-going optimization study are: comparative evaluation of several methods for adaptive coding of the APC residual; comparative testing of several methods for adaptive shaping of the spectrum of the quantization noise; and optimization of various parameter values and their bit allocation. In addition to reporting the results from this study, we report on the occurrence of "limit cycles" or regions of excessive quantization noise build-up, offer an explanation for their cause in terms of the feedback gain (or "loop gain") of the APC transmitter, and present experimental results obtained using several remedial means for this problem. Also, we report on the relative properties of different APC coder configurations. Based on the results of our optimization study to date, we present several optimized APC systems.
- Published
- 2005
- Full Text
- View/download PDF
187. A framework for the objective evaluation of vocoder speech quality
- Author
-
W. Russell, John Makhoul, and R. Viswanathan
- Subjects
Speech enhancement ,Noise ,Noise measurement ,Computer science ,Speech quality ,Speech recognition ,PSQM ,Objective evaluation ,Intelligibility (communication) ,Speech processing - Abstract
While there exist methods in the literature for objectively evaluating the intelligibility of speech in the presence of stationary noise, little has been done regarding the objective evaluation of either the intelligibility or the quality of vocoded speech. We present a framework within which we have begun a step-by-step program to develop objective measures for vocoded speech quality that are consistent with results from subjective tests.
- Published
- 2005
- Full Text
- View/download PDF
188. Baseband LPC coders for speech transmission over 9.6 kb/s noisy channels
- Author
-
A. Higgins, W. Russell, John Makhoul, and R. Viswanathan
- Subjects
Signal processing ,Voice activity detection ,Computer science ,Speech recognition ,Line code ,Bandwidth (signal processing) ,Baseband ,Data_CODINGANDINFORMATIONTHEORY ,Speech processing ,Linear predictive coding ,Communication channel - Abstract
This paper presents the results of our investigation of the various aspects of baseband LPC coders with the goal of maximizing the speech quality at a transmission bit-rate of 9.6 kb/s and for channel bit-error rates of up to 1%. Important among these aspects are: baseband width, coding of baseband, high-frequency regeneration, and error protection of important transmission parameters. The paper discusses these and other issues, presents the results of speech-quality tests conducted during the various stages of optimization, and describes the details of the optimized speech coder.
- Published
- 2005
- Full Text
- View/download PDF
189. Stability analysis of APC systems
- Author
-
M. Berouti, M. Krasner, and John Makhoul
- Subjects
Power gain ,Control theory ,Computer science ,Quantization (signal processing) ,Bit rate ,Speech coding ,Adaptive predictive coding ,Feedback loop ,Harmonic Vector Excitation Coding - Abstract
Adaptive predictive coding (APC) is a useful technique for high-quality digital encoding of speech signals at medium band data rates. A particularly annoying degradation in APC systems, however, is the presence of "glitches" or "beeps" in the output speech. These correspond to frames with signal-to-quantization-noise ratios (S/Q) that are often less than unity, i.e., negative in dB. We show that these degradations are caused by instabilities of the APC system. This paper provides a method for analyzing the stability of APC systems as a function of the encoding bit rate, the quantization algorithm, and the power gain of the APC loop feedback filter. Two methods of improving system performance based on the results of the stability analysis are then developed. Experimental evaluation by listening tests confirm the validity of the techniques.
- Published
- 2005
- Full Text
- View/download PDF
190. Towards perceptually consistent measures of spectral distance
- Author
-
W. Russell, R. Viswanathan, and John Makhoul
- Subjects
Speech enhancement ,Formant ,Speech perception ,Speech recognition ,Perception ,media_common.quotation_subject ,Intelligibility (communication) ,Linear predictive coding ,Speech processing ,Distance measures ,media_common ,Mathematics - Abstract
This paper considers distance measures for determining the deviation between two smoothed short-time speech spectra. Since such distance measures are employed in speech processing applications that either involve or relate to human perceptual judgment, the effectiveness of these measures will be enhanced if they provide results consistent with human speech perception. As a first step, we suggest Flanagan's results on difference limens for formant frequencies as one basis for checking the perceptual consistency of a measure. A general necessary condition for perceptual consistency is derived for a class of spectral distance measures. A class of perceptually consistent measures obtained through experimental investigations is then described, and results obtained using one such measure under Flanagan's test conditions are presented.
- Published
- 2005
- Full Text
- View/download PDF
191. BYBLOS: The BBN continuous speech recognition system
- Author
-
Richard Schwartz, G. Kubala, S. Roucos, John Makhoul, Owen Kimball, M. Krasner, Patti Price, M. Dunham, and Yen-Lu Chow
- Subjects
Vocabulary ,Perplexity ,Grammar ,Computer science ,business.industry ,Speech recognition ,media_common.quotation_subject ,Speech corpus ,computer.software_genre ,Rule-based machine translation ,Word recognition ,Artificial intelligence ,Hidden Markov model ,business ,Coarticulation ,computer ,Natural language processing ,media_common - Abstract
In this paper, we describe BYBLOS, the BBN continuous speech recognition system. The system, designed for large vocabulary applications, integrates acoustic, phonetic, lexical, and linguistic knowledge sources to achieve high recognition performance. The basic approach, as described in previous papers [1, 2], makes extensive use of robust context-dependent models of phonetic coarticulation using Hidden Markov Models (HMM). We describe the components of the BYBLOS system, including: signal processing frontend, dictionary, phonetic model training system, word model generator, grammar and decoder. In recognition experiments, we demonstrate consistently high word recognition performance on continuous speech across: speakers, task domains, and grammars of varying complexity. In speaker-dependent mode, where 15 minutes of speech is required for training to a speaker, 98.5% word accuracy has been achieved in continuous speech for a 350-word task, using grammars with perplexity ranging from 30 to 60. With only 15 seconds of training speech we demonstrate performance of 97% using a grammar.
- Published
- 2005
- Full Text
- View/download PDF
192. Improved hidden Markov modeling of phonemes for continuous speech recognition
- Author
-
Richard Schwartz, M. Krasner, John Makhoul, S. Roucos, and Y. L. Chow
- Subjects
Context model ,Training set ,Computer science ,business.industry ,Speech recognition ,Variable-order Markov model ,Computer Science::Computation and Language (Computational Linguistics and Natural Language and Speech Processing) ,Pattern recognition ,Probability density function ,Markov model ,ComputingMethodologies_PATTERNRECOGNITION ,Computer Science::Sound ,ComputingMethodologies_DOCUMENTANDTEXTPROCESSING ,Markov property ,Hidden semi-Markov model ,Artificial intelligence ,Hidden Markov model ,business ,Parametric statistics - Abstract
This paper discusses the use of the Hidden Markov Model (HMM) in phonetic recognition. In particular, we present improvements that deal with the problems of modeling the effect of phonetic context and the problem of robust pdf estimation. The effect of phonetic context is taken into account by conditioning the probability density functions (pdfs) of the acoustic parameters on the adjacent phonemes, only to the extent that there are sufficient tokens of the phoneme in that context. This partial conditioning is achieved by combining the conditioned and unconditioned pdfs models with weights that depend on the confidence in each pdf estimate. This combination is shown to result in better performance than either model by itself. We also show that it is possible to obtain the computational advantages of using discrete probability densities without the usual requirement for large amounts of training data.
- Published
- 2005
- Full Text
- View/download PDF
193. New lattice methods for linear prediction
- Author
-
John Makhoul
- Subjects
Mathematical optimization ,Lattice (order) ,Quantization (signal processing) ,Computation ,Autocorrelation ,Linear prediction ,Covariance ,Algorithm ,Lattice multiplication ,Mathematics - Abstract
This paper presents a new formulation for linear prediction, which we call the covariance lattice method. The method is viewed as one of a class of lattice methods which guarantee the stability of the all-pole filter, with or without windowing of the signal, with finite wordlength computations, and with the number of computations being comparable to the traditional autocorrelation and covariance methods. In addition, quantization of the reflection coefficients can be accomplished within the recursion for retention of accuracy in representation.
- Published
- 2005
- Full Text
- View/download PDF
194. Context-dependent modeling for acoustic-phonetic recognition of continuous speech
- Author
-
John Makhoul, M. Krasner, S. Roucos, Richard Schwartz, Y. L. Chow, and Owen Kimball
- Subjects
Context model ,Computer science ,business.industry ,Speech recognition ,Acoustic model ,Pattern recognition ,Viterbi algorithm ,Markov model ,Speaker recognition ,ComputingMethodologies_ARTIFICIALINTELLIGENCE ,Sequence labeling ,symbols.namesake ,ComputingMethodologies_PATTERNRECOGNITION ,symbols ,Artificial intelligence ,Hidden Markov model ,business ,Signature recognition - Abstract
This paper describes the results of our work in designing a system for phonetic recognition of unrestricted continuous speech. We describe several algorithms used to recognize phonemes using context-dependent Hidden Markov Models of the phonemes. We present results for several variations of the parameters of the algorithms. In addition, we propose a technique that makes it possible to integrate traditional acoustic-phonetic features into a hidden Markov process. The categorical decisions usually associated with heuristic acoustic-phonetic algorithms are replaced by automated training techniques and global search strategies. The combination of general spectral information and specific acoustic-phonetic features is shown to result in more accurate phonetic recognition than either representation by itself.
- Published
- 2005
- Full Text
- View/download PDF
195. Using quick transcriptions to improve conversational speech models
- Author
-
Chia-Lin Kao, Teodoro Arvizo, Rukmini Iyer, Owen Kimball, and John Makhoul
- Subjects
Conversational speech ,Computer science ,Speech recognition ,Speech corpus - Published
- 2004
- Full Text
- View/download PDF
196. Classification capabilities of two-layer neural nets
- Author
-
R. Schwartz, A. El-Jaroudi, and John Makhoul
- Subjects
Physical neural network ,Artificial neural network ,Time delay neural network ,business.industry ,Computer science ,Feed forward ,Sigmoid function ,Rectifier (neural networks) ,Topology ,Expression (mathematics) ,Nonlinear system ,Feedforward neural network ,Artificial intelligence ,Layer (object-oriented design) ,business - Abstract
The authors consider the classification capabilities of feedforward two-layer neural nets with a single hidden layer and having threshold units only; that is they consider the type of decision regions that two-layer nets are capable of forming in the input space. It had been asserted previously that such nets are capable of forming only convex decision regions or nonconvex but connected regions. The authors show that two-layer nets are capable of forming disconnected decision regions as well. In addition to giving examples of the phenomena, they explain why and how disconnected decision regions are formed. They also derive an expression for the number of cells in the input space that are to be grouped together to form the decision regions. This expression can be useful in deciding how many nodes to have in the first layer. The results have bearing on neural networks where the nonlinear elements are smooth (sigmoid) functions rather than threshold functions. >
- Published
- 2003
- Full Text
- View/download PDF
197. A compact model for speaker-adaptive training
- Author
-
J. McDonough, Richard Schwartz, John Makhoul, and Tasos Anastasakos
- Subjects
education.field_of_study ,Estimation theory ,business.industry ,Computer science ,Speech recognition ,Population ,Word error rate ,Context (language use) ,Pattern recognition ,Electronic mail ,Variation (linguistics) ,Loudspeaker ,Artificial intelligence ,Hidden Markov model ,education ,business - Abstract
We formulate a novel approach to estimating the parameters of continuous density HMMs for speaker-independent (SI) continuous speech recognition. It is motivated by the fact that variability in SI acoustic models is attributed to both phonetic variation and variation among the speakers of the training population, that is independent of the information content of the speech signal. These two variation sources are decoupled and the proposed method jointly annihilates the inter-speaker variation and estimates the HMM parameters of the SI acoustic models. We compare the proposed training algorithm to the common SI training paradigm within the context of supervised adaptation. We show that the proposed acoustic models are more efficiently adapted to the test speakers, thus achieving significant overall word error rate reductions of 19% and 25% for 20K and 5K vocabulary tasks respectively.
- Published
- 2002
- Full Text
- View/download PDF
198. Multi-font recognition of printed Arabic using the BBN BYBLOS speech recognition system
- Author
-
C. Raphael, Richard Schwartz, John Makhoul, C. LaPre, and Ying Zhao
- Subjects
business.industry ,Character (computing) ,Computer science ,Arabic ,Intelligent character recognition ,Speech recognition ,Feature extraction ,Image segmentation ,Optical character recognition ,computer.software_genre ,Intelligent word recognition ,language.human_language ,Handwriting recognition ,Font ,language ,Artificial intelligence ,business ,Hidden Markov model ,computer ,Natural language processing - Abstract
We use a hidden Markov model (HMM) based continuous speech recognition system to perform off-line character recognition (OCR) of Arabic printed text. The HMM trainer and recognizer are used without change, however we modify the feature extraction stage to compute features relevant to OCR. Although we begin by segmenting the page into a collection of lines, no further segmentation is necessary for either recognition or training. Experiments on the ARPA Arabic data corpus yield a range of character error rates from under one percent for a single computer font to 2.8% for multiple-font recognition of a wide range of material from books, magazines and newspapers.
- Published
- 2002
- Full Text
- View/download PDF
199. Comparative experiments on large vocabulary speech recognition
- Author
-
Richard Schwartz, Francis Kubala, G. Zavaliagkos, John Makhoul, Long Nguyen, and Tasos Anastasakos
- Subjects
Vocabulary ,Voice activity detection ,Artificial neural network ,Dictation ,Microphone ,Computer science ,business.industry ,Speech recognition ,media_common.quotation_subject ,Codebook ,Speaker recognition ,computer.software_genre ,Feature (machine learning) ,Selection (linguistics) ,Independence (mathematical logic) ,Speech analytics ,Language model ,Artificial intelligence ,Hidden Markov model ,business ,computer ,Natural language ,Natural language processing ,media_common ,Spoken language - Abstract
We describe recent changes to the BYBLOS system's training and recognition algorithms and report on numerous experiments in large vocabulary speech recognition. In earlier work, we performed five key experiments that were designed to answer questions related to different training scenarios. We investigated (1) the effect of varying the number of training speakers if the total amount of training data remains constant, (2) data pooling versus model averaging for generating speaker-independent (SI) HMMs, (3) the benefit of doubling the acoustic training data, (4) SI versus SD performance when the SI training data is twelve times greater, (5) the effect of cross-domain training for both the acoustic and language models. Our recent work was focused on four specific problem areas sharing the common thread that the test condition exposes the recognizer to phenomena not observed in the training data. Here we investigated (1) words outside the vocabulary, (2) spoken language effects due to subject variability and spontaneous dictation, (3) non-native dialects of the language, and (4) new microphones not used in training. >
- Published
- 2002
- Full Text
- View/download PDF
200. On-line cursive handwriting recognition using speech recognition methods
- Author
-
Richard Schwartz, G. Chou, Thad Starner, and John Makhoul
- Subjects
business.industry ,Computer science ,Intelligent character recognition ,Feature vector ,Speech recognition ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Word error rate ,computer.software_genre ,Lexicon ,ComputingMethodologies_PATTERNRECOGNITION ,Handwriting recognition ,Handwriting ,ComputingMethodologies_DOCUMENTANDTEXTPROCESSING ,Artificial intelligence ,business ,Hidden Markov model ,computer ,Natural language processing ,Word (computer architecture) - Abstract
A hidden Markov model (HMM) based continuous speech recognition system is applied to on-line cursive handwriting recognition. The base system is unmodified except for using handwriting feature vectors instead of speech. Due to inherent properties of HMMs, segmentation of the handwritten script sentences is unnecessary. A 1.1% word error rate is achieved for a 3050 word lexicon, 52 character, writer-dependent task and 3%-5% word error rates are obtained for six different writers in a 25,595 word lexicon, 86 character, writer-dependent task. Similarities and differences between the continuous speech and on-line cursive handwriting recognition tasks are explored; the handwriting database collected over the past year is described; and specific implementation details of the handwriting system are discussed. >
- Published
- 2002
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.