64 results on '"James K. Baker"'
Search Results
2. An investigation of acoustic models for multilingual code-switching.
- Author
-
Christopher M. White, Sanjeev Khudanpur, and James K. Baker
- Published
- 2008
- Full Text
- View/download PDF
3. Large vocabulary continuous speech recognition of Wall Street Journal data.
- Author
-
Robert Roth, James K. Baker, Janet M. Baker, Larry Gillick, Melvyn J. Hunt, Yoshiko Ito, Stephen Lowe, Jeremy Orloff, Barbara Peskin, and Francesco Scattone
- Published
- 1993
- Full Text
- View/download PDF
4. Application of large vocabulary continuous speech recognition to topic and speaker identification using telephone speech.
- Author
-
Lawrence Gillick, James K. Baker, Janet M. Baker, John S. Bridle, Melvyn J. Hunt, Yoshiko Ito, Stephen Lowe, Jeremy Orloff, Barbara Peskin, Robert Roth, and Francesco Scattone
- Published
- 1993
- Full Text
- View/download PDF
5. On the interaction between true source, training, and testing language models.
- Author
-
Douglas B. Paul, James K. Baker, and Janet M. Baker
- Published
- 1991
- Full Text
- View/download PDF
6. Topic and Speaker Identification via Large Vocabulary Continuous Speech Recognition.
- Author
-
Barbara Peskin, Larry Gillick, Yoshiko Ito, Stephen Lowe, Robert Roth, Francesco Scattone, James K. Baker, Janet M. Baker, John S. Bridle, Melvyn J. Hunt, and Jeremy Orloff
- Published
- 1993
7. Large Vocabulary Recognition of Wall Street Journal Sentences at Dragon Systems.
- Author
-
James K. Baker, Janet M. Baker, Paul Bamberg, Kathleen Bishop, Larry Gillick, Vera Helman, Zezhen Huang, Yoshiko Ito, Stephen Lowe, Barbara Peskin, Robert Roth, and Francesco Scattone
- Published
- 1992
8. DRAGON Systems Resource Management Benchmark Results February 1991.
- Author
-
James K. Baker, Janet M. Baker, Pard Bamberg, Larry Gillick, Lori Lamel, Robert Roth, Francesco Scattone, Dean Sturtevant, Ousmane Ba, and Richard Benedict
- Published
- 1991
9. On the Interaction Between True Source, Training, and Testing Language Models.
- Author
-
Douglas B. Paul, James K. Baker, and Janet M. Baker
- Published
- 1990
10. Automatic recognition of continuously spoken sentences from a finite state grammer.
- Author
-
Lalit R. Bahl, James K. Baker, Paul S. Cohen, A. G. Cole, Frederick Jelinek, Burn L. Lewis, and Robert L. Mercer
- Published
- 1978
- Full Text
- View/download PDF
11. Partial traceback and dynamic programming.
- Author
-
Peter F. Brown, James C. Spohrer, Peter H. Hochschild, and James K. Baker
- Published
- 1982
- Full Text
- View/download PDF
12. Cost-effective speech processing.
- Author
-
James K. Baker, Janet M. Baker, Robert Roth, and Pard Bamberg
- Published
- 1984
- Full Text
- View/download PDF
13. Preliminary results on the performance of a system for the automatic recognition of continuous speech.
- Author
-
Lalit R. Bahl, James K. Baker, Paul S. Cohen, N. R. Dixon, Frederick Jelinek, Robert L. Mercer, and Harvey F. Silverman
- Published
- 1976
- Full Text
- View/download PDF
14. Recognition of continuously read natural corpus.
- Author
-
Lalit R. Bahl, James K. Baker, Paul S. Cohen, Frederick Jelinek, Burn L. Lewis, and Robert L. Mercer
- Published
- 1978
- Full Text
- View/download PDF
15. Dragon.
- Author
-
Janet M. Baker and James K. Baker
- Published
- 1989
16. Vascular Injury in Anterior Lumbar Surgery
- Author
-
James K. Baker, Patrick R. Reardon, Michael J. Reardon, and Michael H. Heggeness
- Subjects
Adult ,Male ,medicine.medical_specialty ,Rectus Abdominis ,Vena Cava, Inferior ,Iliac Vein ,Inferior vena cava ,Lumbar ,Humans ,Medicine ,Retroperitoneal space ,Orthopedics and Sports Medicine ,Intraoperative Complications ,Rectus abdominis muscle ,Retrospective Studies ,Lumbar Vertebrae ,business.industry ,Vascular disease ,Incidence ,medicine.disease ,Surgery ,medicine.anatomical_structure ,medicine.vein ,Great vessels ,Female ,Spinal Diseases ,Iliolumbar Vein ,Neurology (clinical) ,Radiology ,business ,Intervertebral Disc Displacement ,Common iliac vein - Abstract
Anterior approaches to the lumbar spine are rapidly gaining popularity for decompressive and reconstructive procedures. A recognized hazard to this approach to the spine is possible injury to the great vessels. This retrospective study is a review of 102 consecutive anterior lumbar spinal procedures. All approaches were performed by one of two fellowship-trained vascular surgeons. Both have extensive experience with this approach. All injuries to the inferior vena cava, common iliac vein, or other great vessels that required suture repair were recorded. The authors were surprised to note an overall rate for this vascular complication of 15.6%. These injuries included 11 tears of the common iliac vein, four tears of the inferior vena cava, and one avulsion of the iliolumbar vein. Two different approaches were used during this study. Twenty-six cases were performed through a flank incision, with the dissection proceeding through the external and internal oblique muscles as well as the transversus abdominis. The average number of levels exposed was 2.3. Two vascular complications resulted, for an incidence of 7.7%. Seventy-six procedures were carried out through a small (5-10 cm) incision overlying the rectus abdominis muscle. The retroperitoneal space was entered through the posterior rectus sheath without division of any muscle tissue. This resulted in 14 vascular complications, for an incidence of 18.4%. Although the authors are unaware of any major long-term morbidity from this complication in their patient group, they believe feel that the true incidence of this potentially quite serious complication may be underestimated.(ABSTRACT TRUNCATED AT 250 WORDS)
- Published
- 1993
- Full Text
- View/download PDF
17. ANSWER PLEASE
- Author
-
James K Baker, Charles T Stephenson, and Hugh S Tullos
- Subjects
Orthopedics and Sports Medicine ,Surgery - Published
- 1993
- Full Text
- View/download PDF
18. Lymphadenitis in an 18-month-old traveler to Mexico
- Author
-
Susan E. Dorman, James K. Baker, Tafadzwa S. Kasambira, Nancy Hooper, and Karla Alwood
- Subjects
Microbiology (medical) ,Pediatrics ,medicine.medical_specialty ,Tuberculosis ,Lymphadenitis ,medicine ,Travel medicine ,Humans ,Mexico ,Mycobacterium bovis ,Travel ,biology ,business.industry ,Infant ,biology.organism_classification ,medicine.disease ,Virology ,Vaccination ,Infectious Diseases ,Pediatrics, Perinatology and Child Health ,Female ,Lymph Nodes ,business ,Neck - Published
- 2007
19. Cyst of the Ligamentum Flavum
- Author
-
James K. Baker and Gregory W. Hanson
- Subjects
musculoskeletal diseases ,Radicular Syndrome ,medicine.medical_specialty ,Diagnosis, Differential ,Sciatica ,Surgical removal ,medicine ,Humans ,Orthopedics and Sports Medicine ,Cyst ,Hemorrhagic cyst ,Cysts ,business.industry ,Nerve Compression Syndromes ,Middle Aged ,musculoskeletal system ,medicine.disease ,Surgery ,Ligamentum Flavum ,Synovial Cyst ,Female ,Lumbar spine ,Neurology (clinical) ,medicine.symptom ,Spinal Nerve Roots ,business - Abstract
A patient with sciatica resulting from a hemorrhagic cyst of the ligamentum flavum is reported. Surgical removal of the cyst relieved the patient of her sciatic symptoms.Cyst of ligamentum flavum is an uncommon cause of radiculopathy. There are 13 cases in the literature.The differential diagnosis of benign interspinal extradural mass lesions includes spinal synovial cysts, ligamentum flavum cysts, perineural cysts, dermoid cysts, and parasitic cysts.
- Published
- 1994
- Full Text
- View/download PDF
20. Large vocabulary natural language speech recognition in software.
- Author
-
James K. Baker and Janet M. Baker
- Published
- 1987
21. Arthrodesis of the ankle with lateral plating
- Author
-
Hugh S. Tullos, James K. Baker, and W. Grant Braly
- Subjects
Male ,medicine.medical_specialty ,medicine.medical_treatment ,Arthrodesis ,Arthritis ,Postoperative Complications ,medicine ,Internal fixation ,Humans ,Orthopedics and Sports Medicine ,Malunion ,Aged ,Retrospective Studies ,business.industry ,Middle Aged ,medicine.disease ,Arthroplasty ,Internal Fixators ,Surgery ,Radiography ,medicine.anatomical_structure ,Treatment Outcome ,Rheumatoid arthritis ,Female ,Ankle ,business ,Cancellous bone ,Ankle Joint - Abstract
A modification of internal fixation compression arthrodesis for ankle fusion is described using two 6.5-mm cancellous bone screws and a lateral T plate. Using this technique, 20 consecutive arthrodeses by one surgeon were reviewed. Solid union was attained in 19 of 20 patients (95%). Average follow-up was 18 months (range 6–59 months). Time to obtain solid arthrodeses averaged 18 weeks. In 11 patients who returned for follow-up, clinical grading using the Mazur scale score averaged 70 of 90 points. Diagnoses included posttraumatic degenerative arthritis, failed ankle arthrodesis and rheumatoid arthritis (2 each), failed ankle arthroplasty, and posttuberculous arthritis (1 each). Complications included one malunion and one asymptomatic screw malposition. All patients attaining union were pleased with the procedure.
- Published
- 1994
22. LINGSTAT
- Author
-
Paul G. Bamberg, Linda Manganaro, Haakon L. Chevalier, Taiko Dietzel, Jonathan Yamron, Todd P. Margolis, Frank Kampmann, James K. Baker, John Elder, Mark Mandel, and Elizabeth E. Steele
- Subjects
Machine translation ,Computer science ,business.industry ,computer.software_genre ,Machine translation software usability ,Example-based machine translation ,Universal Networking Language ,Rule-based machine translation ,Computer-assisted translation ,Artificial intelligence ,business ,computer ,Language industry ,Interactive machine translation ,Natural language processing - Abstract
In this paper we present the first implementation of LINGSTAT, an interactive machine translation system designed to increase the productivity of a user, with little knowledge of the source language, in translating or extracting information from foreign language documents. In its final form, LINGSTAT will make use of statistical information gathered from parallel and single-language corpora, and linguistic information at all levels (lexical, syntactic, and semantic).
- Published
- 1993
- Full Text
- View/download PDF
23. Topic and speaker identification via large vocabulary continuous speech recognition
- Author
-
Larry Gillick, Stephen Lowe, Melvyn J. Hunt, Yoshiko Ito, Janet M. Baker, John Bridle, James K. Baker, Barbara Peskin, Jeremy Orloff, Robert Roth, and Francesco Scattone
- Subjects
Vocabulary ,Computer science ,business.industry ,Speech recognition ,media_common.quotation_subject ,Speaker recognition ,computer.software_genre ,Speaker diarisation ,Identification (information) ,Speaker identification ,Artificial intelligence ,business ,computer ,Natural language processing ,media_common - Abstract
In this paper we exhibit a novel approach to the problems of topic and speaker identification that makes use of a large vocabulary continuous speech recognizer. We present a theoretical framework which formulates the two tasks as complementary problems, and describe the symmetric way in which we have implemented their solution. Results of trials of the message identification systems using the Switchboard corpus of telephone conversations are reported.
- Published
- 1993
- Full Text
- View/download PDF
24. Large vocabulary recognition of Wall Street Journal sentences at Dragon Systems
- Author
-
Stephen Lowe, Kathleen Bishop, Paul G. Bamberg, Zezhen Huang, Janet M. Baker, Francesco Scattone, Yoshiko Ito, James K. Baker, Vera Helman, Larry Gillick, Robert Roth, and Barbara Peskin
- Subjects
Set (abstract data type) ,Sequence ,Computer science ,Speech recognition ,Expectation–maximization algorithm ,Double exponential function ,Word error rate ,Probability distribution ,Hidden Markov model ,Cluster analysis - Abstract
In this paper we present some of the algorithm improvements that have been made to Dragon's continuous speech recognition and training programs, improvements that have more than halved our error rate on the Resource Management task since the last SLS meeting in February 1991. We also report the "dry run" results that we have obtained on the 5000-word speaker-dependent Wall Street Journal recognition task, and outline our overall research strategy and plans for the future.In our system, a set of output distributions, known as the set of PELs (phonetic elements), is associated with each phoneme. The HMM for a PIC (phoneme-in-context) is represented as a linear sequence of states, each having an out-put distribution chosen from the set of PELs for the given phoneme, and a (double exponential) duration distribution.In this paper we report on two methods of acoustic modeling and training. The first method involves generating a set of (unimodal) PELs for a given speaker by clustering the hypothetical frames found in the spectral models for that speaker, and then constructing speaker-dependent PEL sequences to represent each PIC. The "spectral model" for a PIC is simply the expected value of the sequence of frames that would be generated by the PIC. The second method represents the probability distribution for each parameter in a PEL as a mixture of a fixed set of unimodal components, the mixing weights being estimated using the EM algorithm. In both models we assume that the parameters are statistically independent.We report results obtained using each of these two methods (RePELing/Respelling and univariate "tied mixtures") on the 5000-word closed-vocabulary verbalized punctuation version of the Wall Street Journal task.
- Published
- 1992
- Full Text
- View/download PDF
25. Dragon systems resource management benchmark results---February 1991
- Author
-
Lori Lamel, Ousmane Ba, Janet M. Baker, Richard Benedict, Dean Sturtevant, James K. Baker, Francesco Scattone, Robert Roth, Larry Gillick, and Paul G. Bamberg
- Subjects
Sequence ,business.industry ,Computer science ,Markov process ,Machine learning ,computer.software_genre ,Task (project management) ,symbols.namesake ,Benchmark (computing) ,symbols ,Segmentation ,Resource management ,Artificial intelligence ,business ,Representation (mathematics) ,Hidden Markov model ,computer - Abstract
In this paper we present preliminary results obtained at Dragon Systems on the Resource Management benchmark task. The basic conceptual units of our system are Phonemes-in-Context (PICs), which are represented as Hidden Markov Models, each of which is expressed as a sequence of Phonetic Elements (PELs). The PELs corresponding to a given phoneme constitute a kind of alphabet for the representation of PICs.For the speaker-dependent tests, two basic methods of training the acoustic models were investigated. The first method of training the Resource Management models is to re-estimate the models for each test speaker from that speaker's training data, keeping the PEL spellings of the PICs fixed. The second approach is to use the re-estimated models from the first method to derive a segmentation of the training data, then to respell the PICs in a largely speaker-dependent manner in order to improve the representation of speaker differences. A full explanation of these methods is given, as are results using each method.In addition to reporting on two different training strategies, we discuss N-Best results. The N-Best algorithm is a modification of the algorithm proposed by Soong and Huang at the June 1990 workshop. This algorithm runs as a post-processing step and uses an A*-search (an algorithm also known as a 'stack decoder').
- Published
- 1991
- Full Text
- View/download PDF
26. STOCHASTIC MODELING FOR AUTOMATIC SPEECH UNDERSTANDING
- Author
-
James K. Baker
- Subjects
Computer science ,business.industry ,Speech recognition ,Automatic speech ,Artificial intelligence ,business ,computer.software_genre ,computer ,Natural language processing - Published
- 1990
- Full Text
- View/download PDF
27. Assisted speech recognition by dual search acceleration technique
- Author
-
James K. Baker
- Subjects
Audio mining ,Voice activity detection ,Acoustics and Ultrasonics ,Arts and Humanities (miscellaneous) ,Point (typography) ,Computer science ,Speech recognition ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Process (computing) ,Acoustic model ,DUAL (cognitive architecture) ,Speaker recognition ,Speech processing - Abstract
A speech recognition method, system and program product, the method in one embodiment comprising: obtaining input speech data; initiating a first speech recognition search process with at least one hypothesis; initiating a second speech recognition search process with a plurality of hypotheses; obtaining partial results from the second speech recognition search process, where the partial results include an evaluation of at least one hypothesis that the first speech recognition search process has not evaluated at this point in time; and utilizing the partial results to alter the first speech recognition search process.
- Published
- 2006
- Full Text
- View/download PDF
28. Lexical tree pre-filtering in speech recognition
- Author
-
Robert Roth, Alan Walsh, Laurence S. Gillick, and James K. Baker
- Subjects
Tree (data structure) ,Speech production ,Vocabulary ,Acoustics and Ultrasonics ,Arts and Humanities (miscellaneous) ,Computer science ,media_common.quotation_subject ,Speech recognition ,Speech analytics ,Speech corpus ,State (computer science) ,Pre filtering ,media_common - Abstract
A speech recognition technique uses lexical tree pre-filtering to obtain lists of words for use in performing speech recognition. The lexical tree pre-filtering includes representing a vocabulary of words using a lexical tree and identifying a first subset of the vocabulary that may correspond to speech spoken beginning at a first time by propagating through the lexical tree information about the speech spoken beginning at the first time. A second subset of the vocabulary that may correspond to speech spoken beginning at a second time is identified by propagating through the lexical tree information about the speech spoken beginning at the second time. Words included in the speech are recognized by comparing speech spoken beginning at the first time with words from the first subset of the vocabulary and speech spoken beginning at the second time with words from the second subset of the vocabulary. The state of the lexical tree is not reset between identifying the first and second subsets.
- Published
- 2000
- Full Text
- View/download PDF
29. Colonic perforation following mild trauma in a patient with Crohn's disease
- Author
-
James K. Baker and Gary Johnson
- Subjects
Adult ,Male ,medicine.medical_specialty ,Abdominal pain ,Colon ,medicine.medical_treatment ,Perforation (oil well) ,Inflammatory bowel disease ,Crohn Disease ,Pneumoperitoneum ,Colon surgery ,Laparotomy ,medicine ,Humans ,Crohn's disease ,business.industry ,Sigmoid colon ,General Medicine ,medicine.disease ,digestive system diseases ,Surgery ,medicine.anatomical_structure ,Intestinal Perforation ,Emergency Medicine ,medicine.symptom ,business - Abstract
A 26-year-old man with a history of Crohn's disease was struck in the abdomen by an opponent's shoulder while playing basketball. He presented to the emergency department 3 hours later with the complaint of abdominal pain and was admitted to the hospital for observation. Nine hours after presentation a computed tomography scan showed he had pneumoperitoneum and then underwent laparotomy. A perforated segment of sigmoid colon with severe inflammatory disease was found and resected. The rest of his small and large bowels were otherwise unremarkable. His localized but severe inflammatory bowel disease predisposed him to bowel perforation with minimal trauma. This is the first report of a patient with inflammatory bowel disease and traumatic colon perforation; it is also the first report of a patient with a bowel perforation with minimal traumatic force.
- Published
- 1990
- Full Text
- View/download PDF
30. Speech recognition system for languages with compound words
- Author
-
Paul G. Bamberg, Jed M. Roberts, Caroline B. Huang, Claudia L. E. Ellermann, James K. Baker, and Stijn Vaeven
- Subjects
Vocabulary ,Acoustics and Ultrasonics ,Arts and Humanities (miscellaneous) ,Computer science ,media_common.quotation_subject ,Compound ,Utterance ,Linguistics ,media_common - Abstract
A system and associated methods for recognizing (12) compound words from an utterance (22) containing a succession of one or more words from a predetermined vocabulary. At least one of the words in the utterance is a compound word including at least two formatives in succession, wherein those formatives are words in the vocabulary.
- Published
- 1999
- Full Text
- View/download PDF
31. Polemics
- Author
-
James K. Baker, Y. Tandon, Russell Warren Howe, Ama Ata Aidoo, Munhamnu B. Utete, Omufume F. Onoge, Kinoru A. Gaching'a, A. N. Hakam, E. R. Ibira, K. Y. Waibike, and Ali A. Mazrui
- Published
- 1997
- Full Text
- View/download PDF
32. Apparatus and method for traning speech recognition systems and their users and otherwise improving speech recognition performance
- Author
-
Elizabeth E. Steele, James K. Baker, and Joel M. Gould
- Subjects
Vocabulary ,Acoustics and Ultrasonics ,Computer science ,Speech recognition ,media_common.quotation_subject ,SIGNAL (programming language) ,Context (language use) ,Arts and Humanities (miscellaneous) ,Word recognition ,Language model ,State (computer science) ,Set (psychology) ,Word (computer architecture) ,media_common - Abstract
A tutorial instructs how to use a word recognition system, such as one for speech recognition. It specifies a set of allowed response words for each of a plurality of states. It sends messages on how to use the recognizer in certain states, and, in others, presents exercises in which the user is to enter signals representing expected words. It scores each such signal against word models to select which response word corresponds to it, and then advances to a state associated with that selected response. This scoring is performed against a large vocabulary even though only a small number of responses are allowed, and the signal is rejected if too many non-allowed words score better than any allowed word. The system comes with multiple sets of standard signal models; it scores each against a given user's signals, selects the set which scores best, and then performs adaptive and batch training upon that set. Preferably, the tutorial prompts users to enter the words used for training in an environment similar to that of the actual recognizer the tutorial is training them to use. The system will normally simulate the recognition of the prompted word, but will sometimes it will simulate an error. When it does, notifies the user if he fails to correct the error. The recognizer associated with the tutorial allows users to perform adaptive training either on all words, or only on those whose recognition has been corrected or confirmed. The recognizer also uses a context language model which indicates the probability that a given word will be used in the context of other words which precede it in a grouping of text.
- Published
- 1996
- Full Text
- View/download PDF
33. Method for interactive speech recognition and training
- Author
-
Jed M. Roberts, Edward W. Porter, and James K. Baker
- Subjects
Spoken word ,Vocabulary ,Acoustics and Ultrasonics ,Intelligent character recognition ,Computer science ,media_common.quotation_subject ,Speech recognition ,Code word ,Word error rate ,Arts and Humanities (miscellaneous) ,Artificial Intelligence ,Connected speech ,media_common ,Dictation ,General Engineering ,Speech technology ,Training (meteorology) ,Acoustic model ,Speech corpus ,Viseme ,Speaker recognition ,Computer Science Applications ,Line wrap and word wrap ,Word lists by frequency ,Word recognition ,Factored language model ,Language model ,Natural language ,Word (computer architecture) - Abstract
A method for creating word models for a large vocabulary, natural language dictation system. A user with limited typing skills can create documents with little or no advance training of word models. As the user is dictating, the user speaks a word which may or may not already be in the active vocabulary. The system displays a list of the words in the active vocabulary which best match the spoken word. By keyboard or voice command, the user may choose the correct word from the list or may choose to edit a similar word if the correct word is not on the list. Alternately, the user may type or speak the initial letters of the word. Then the recognition algorithm is called again satisfying the initial letters, and the choices displayed again. A word list is then also displayed from a large backup vocabulary. The best words to display from the backup vocabulary are chosen using a statistical language model and optionally word models derived from a phonemic dictionary. When the correct word is chosen by the user, the speech sample is used to create or update an acoustic model for the word, without further intervention by the user. As the system is used, it also constantly updates its statistical language model. The system gets more and more word models and keeps improving its performance the more it is used. The system may be used for connected speech as well as for discrete utterances.
- Published
- 1993
- Full Text
- View/download PDF
34. Method for representing word models for use in speech recognition
- Author
-
Laurence S. Gillick, Janet M. Baker, James K. Baker, Dean Sturtevant, and Robert Roth
- Subjects
Sequence ,Acoustics and Ultrasonics ,Computer science ,Speech recognition ,Acoustic model ,Word error rate ,Computer Science::Computation and Language (Computational Linguistics and Natural Language and Speech Processing) ,Spelling ,Set (abstract data type) ,Arts and Humanities (miscellaneous) ,Computer Science::Sound ,Metric (mathematics) ,Factored language model ,Cluster analysis ,Word (computer architecture) - Abstract
A method is provided for deriving acoustic word representations for use in speech recognition. Initial word models are created, each formed of a sequence of acoustic sub-models. The acoustic sub-models from a plurality of word models are clustered, so as to group acoustically similar sub-models from different words, using, for example, the Kullback-Leibler information as a metric of similarity. Then each word is represented by cluster spelling representing the clusters into which its acoustic sub-models were placed by the clustering. Speech recognition is performed by comparing sequences of frames from speech to be recognized against sequences of acoustic models associated with the clusters of the cluster spelling of individual word models. The invention also provides a method for deriving a word representation which involves receiving a first set of frame sequences for a word, using dynamic programming to derive a corresponding initial sequence of probabilistic acoustic sub-models for the word independently of any previously derived acoustic model particular to the word, using dynamic programming to time align each of a second set of frame sequences for the word into a succession of new sub-sequences corresponding to the initial sequence of models, and using these new sub-sequences to calculate new probabilistic sub-models.
- Published
- 1992
- Full Text
- View/download PDF
35. Interactive speech recognition apparatus
- Author
-
James K. Baker
- Subjects
Audio mining ,Vocabulary ,Acoustics and Ultrasonics ,Computer science ,Speech recognition ,media_common.quotation_subject ,String (computer science) ,Word error rate ,Context (language use) ,Arts and Humanities (miscellaneous) ,Language model ,Transcription (software) ,Word (computer architecture) ,media_common - Abstract
A speech recognition system which can perform multiple recognition passes on each word. If the recognizer is correct in its first pass, the operator may abort later passes by either pressing a key or speaking the next word. Otherwise, the operator may either wait for a second recognition pass to be performed against a larger vocabulary, or may specify one or more initial letters causing the second recognition pass to be performed against a vocabulary substantially restricted to words starting with those initial letters. Each time the user adds an additional letter to the initial string, any previous recognition is aborted and the re-recognition process is started anew with the new string. If the user types a control character after the initial string, then the string itself is used as the output of the recognizer. In one embodiment, a language model limits a relatively small vocabulary used in the first pass to the words most likely to occur given the language context of the dictated word. The system may also be used as an interactive transcription system for prerecorded speech and can operate on either discrete utterances or continuous speech. When used with prerecorded speech, the system displays the best scoring words of a recognition to the user, and, when the user choses a desired word from such a display, the system employs the portion of prerecorded speech matched against the chosen word to help determine where in that prerecorded speech the system should look for the next word to recognize.
- Published
- 1992
- Full Text
- View/download PDF
36. Method for creating and using multiple‐word sound models in speech recognition
- Author
-
James K. Baker, Laurence S. Gillick, Paul G. Bamberg, and Robert Roth
- Subjects
Vocabulary ,Matching (statistics) ,Acoustics and Ultrasonics ,Arts and Humanities (miscellaneous) ,Computer science ,Speech recognition ,media_common.quotation_subject ,Process (computing) ,Acoustic model ,Word error rate ,Word (computer architecture) ,Utterance ,media_common - Abstract
A first speech recognition method receives an acoustic description of an utterance to be recognized and scores a portion of that description against each of a plurality of cluster models representing similar sounds from different words. The resulting score for each cluster is used to calculate a word score for each word represented by that cluster. Preferably these word scores are used to prefilter vocabulary words, and the description of the utterance includes a succession of acoustic decriptions which are compared by linear time alignment against a succession of acoustic models. A second speech recognition method is also provided which matches an acoustic model with each of a succession of acoustic descriptions of an utterance to be recognized. Each of these models has a probability score for each vocabulary word. The probability scores for each word associated with the matching acoustic models are combined to form a total score for that word. The preferred speech recognition method calculates to separate word scores for each currently active vocabulary word from a common succession of sounds. Preferably the first scores is calculated by a time alignment method, while the second score is calculated by a time independent method. Preferably this calculation of two separate word scores is used in one of multiple word-selecting phase of a recognition process, such as in the prefiltering phase.
- Published
- 1991
- Full Text
- View/download PDF
37. Speech recognition training method
- Author
-
Chin-hui Lee, John W. Klovstad, Kalyan Ganesan, and James K. Baker
- Subjects
Silence ,Audio mining ,Voice activity detection ,Acoustics and Ultrasonics ,Arts and Humanities (miscellaneous) ,Computer science ,Template matching ,Speech recognition ,Speech coding ,Acoustic model ,Linear predictive coding ,Speech processing ,Utterance - Abstract
A speech recognition method and apparatus employ a speech processing circuitry for repetitively deriving from a speech input, at a frame repetition rate, a plurality of acoustic parameters. The acoustic parameters represent the speech input signal for a frame time. A plurality of template matching and cost processing circuitries are connected to a system bus, along with the speech processing circuitry, for determining, or identifying, the speech units in the input speech, by comparing the acoustic parameters with stored template patterns. The apparatus can be expanded by adding more template matching and cost processing circuitry to the bus thereby increasing the speech recognition capacity of the apparatus. Template pattern generation is advantageously aided by using a "joker" word to specify the time boundaries of utterances spoken in isolation, by finding the beginning and ending of an utterance surrounded by silence.
- Published
- 1990
- Full Text
- View/download PDF
38. Speech recognition apparatus and method
- Author
-
Mark Franklin Sidell, James K. Baker, Robert Roth, and Paul G. Bamberg
- Subjects
Audio mining ,Vocabulary ,Voice activity detection ,Acoustics and Ultrasonics ,Computer science ,media_common.quotation_subject ,Speech recognition ,Acoustic model ,Speaker recognition ,Speech processing ,Arts and Humanities (miscellaneous) ,Language model ,Word (computer architecture) ,media_common - Abstract
A system is disclosed for recognizing a pattern in a collection of data given a context of one or more other patterns previously identified. Preferably the system is a speech recognition system, the patterns are words and the collection of data is a sequence of acoustic frames. During the processing of each of a plurality of frames, for each word in an active vocabulary, the system updates a likelihood score representing a probability of a match between the word and the frame, combines a language model score based on one or more previously recognized words with that likelihood score, and prunes the word from the active vocabulary if the combined score is below a threshold. A rapid match is made between the frames and each word of an initial vocabulary to determine which words should originally be placed in the active vocabulary. Preferably the system enables an operator to confirm the system's best guess as to the spoken word merely by speaking another word, to indicate that an alternate guess by the system is correct by typing a key associated with that guess, and to indicate that neither the best guess nor the alternate guesses was correct by typing yet another key. The system includes other features, including ones for determining where among the frames to look for the start of speech, and a special hardware processor for computing likelihood scores.
- Published
- 1990
- Full Text
- View/download PDF
39. Method for speech recognition
- Author
-
Laurence S. Gillick and James K. Baker
- Subjects
Audio mining ,Vocabulary ,Similarity (geometry) ,Acoustics and Ultrasonics ,Computer science ,Speech recognition ,media_common.quotation_subject ,Acoustic model ,Linear predictive coding ,Speech shadowing ,Arts and Humanities (miscellaneous) ,Factored language model ,Word (computer architecture) ,media_common - Abstract
A method determines if a portion of speech corresponds to a speech pattern by time aligning both the speech and a plurality of speech pattern models against a common time-aligning model. This compensates for speech variation between the speech and the pattern models. The method then compares the resulting time-aligned speech model against the resulting time-aligned pattern models to determine which of the patterns most probably corresponds to the speech. Preferably there are a plurality of time-aligning models, each representing a group of somewhat similar sound sequences which occur in different words. Each of these time-aligning models is scored for similarity against a portion of speech, and the time-aligned speech model and time-aligned pattern models produced by time alignment with the best scoring time-aligning model are compared to determine the likelihood that each speech pattern corresponds to the portion of speech. This is performed for each successive portion of speech. When a portion of speech appears to correspond to a given speech pattern model, a range of likely start times is calculated for the vocabulary word associated with that model, and a word score is calculated to indicate the likelihood of that word starting in that range. The method uses a more computationally intensive comparison between the speech and selected vocabulary words, so as to more accurately determine which words correspond with which portions of the speech. When this more intensive comparison indicates the ending of a word at a given point in the speech, the method selects the best scoring vocabulary words whose range of start times overlaps that ending time, and performs the computationally intensive comparison on those selected words starting at that point in the speech.
- Published
- 1990
- Full Text
- View/download PDF
40. Method for speech analysis and speech recognition
- Author
-
Paul G. Bamberg, Laurence S. Gillick, James K. Baker, and Robert Roth
- Subjects
Sequence ,Acoustics and Ultrasonics ,Frequency band ,Speech recognition ,Frame (networking) ,Spectrum (functional analysis) ,Computer Science::Computation and Language (Computational Linguistics and Natural Language and Speech Processing) ,Function (mathematics) ,Residual frame ,Dynamic programming ,Arts and Humanities (miscellaneous) ,Computer Science::Sound ,Energy (signal processing) ,Mathematics - Abstract
A method of speech analysis calculates one or more difference parameters for each of a sequence of acoustic frames, where each difference parameter is a function of the difference between an acoustic parameter in one frame and an acoustic parameter in a nearby frame. The method is used in speech recognition which compares the difference parameters of each frame against acoustic models representing speech units, where each speech-unit model has a model of the difference parameters associated with the frames of its speech unit. The difference parameters can be slope parameters or energy difference parameters. Slope parameters are derived by finding the difference between the energy of a given spectral parameter of a given frame and the energy, in a nearby frame, of a spectral parameter associated with a different frequency band. The resulting parameter indicates the extent to which the frequency of energy in the part of the spectrum represented by the given parameter is going up or going down. Energy difference parameters are calculated as a function of the difference between a given spectral parameter in one frame and a spectral parameter in a nearby frame representing the same frequency band. In one embodiment of the invention, dynamic programming compares the difference parameters of a sequence of frames to be recognized against a sequence of dynamic programming elements associated with each of a plurality of speech-unit models. In another embodiment of the invention, each speech-unit model represents one phoneme, and the speech-unit models for a plurality of phonemes are compared against individual frames, to associate with each such frame the one or more phonemes whose models compare most closely with it.
- Published
- 1990
- Full Text
- View/download PDF
41. Speech recognition method
- Author
-
James K. Baker
- Subjects
Vocabulary ,Acoustics and Ultrasonics ,Computer science ,media_common.quotation_subject ,Speech recognition ,Frame (networking) ,Boundary (topology) ,Class (philosophy) ,Function (mathematics) ,Diphone ,Arts and Humanities (miscellaneous) ,Histogram ,otorhinolaryngologic diseases ,Word (computer architecture) ,media_common - Abstract
Smoothed frame labeling associates phonetic frame labels with a given speech frame as a function of (a) the closeness with which the given frame compares to each of a plurality of acoustic models, (b) which frame labels correspond with a neighboring frame, and (c) transition probabilities which indicate, for the frame labels associated with the neighboring frame, which frame labels are probably associated with the given frame. The smoothed frame labeling is used to divide the speech into segments of frames having the same class of labels. The invention represents words as a collection of known diphone models, each of which models the sound before and after a boundary between segments derived by the smoothed frame labeling. At recognition time, the speech is divided into segments by smoothed frame labeling; diphone models are derived for each boundary between the resulting segments; and the resulting diphone models are compared against the known diphone models to determine which of the known diphone models match the segment boundaries in the speech. Then a combined-displaced-evidence method is used to determine which words occur in the speech. This method detects which acoustic patterns, in the form of the known diphone models, match various portions of the speech. In response to each such match, it associates with the speech an evidence score for each vocabulary word in which that pattern is known to occur. It displaces each such score from the location of its associated matched pattern by the known distance between that pattern and the beginning of the score's word. Then all the evidence scores for a word located in a given portion of the speech are combined to produce a score which indicates the probability of that word starting in that portion of the speech. This score is combined with a score produced by comparing a histogram from a portion of the speech against a histogram of each word. The resulting combined score determines whether a given word should undergo a more detailed comparison against the speech to be recognized.
- Published
- 1990
- Full Text
- View/download PDF
42. The DRAGON system--An overview
- Author
-
James K. Baker
- Subjects
Structure (mathematical logic) ,Vocabulary ,Markov chain ,business.industry ,Computer science ,Speech recognition ,media_common.quotation_subject ,Probabilistic logic ,Markov process ,Acoustic model ,Machine learning ,computer.software_genre ,symbols.namesake ,Simple (abstract algebra) ,Signal Processing ,symbols ,Artificial intelligence ,business ,Function (engineering) ,computer ,media_common - Abstract
This paper briefly describes the major features of the DRAGON speech understanding system. DRAGON makes systematic use of a general abstract model to represent each of the knowledge sources necessary for automatic recognition of continuous speech. The model--that of a probabilistic function of a Markov process--is very flexible and leads to features which allow DRAGON to function despite high error rates from individual knowledge sources. Repeated use of a simple abstract model produces a system which is simple in structure, but powerful in capabilities.
- Published
- 1975
- Full Text
- View/download PDF
43. A combinatorial proof of Tucker's lemma for the n-cube
- Author
-
James K. Baker
- Subjects
Discrete mathematics ,Lemma (mathematics) ,business.industry ,Mathematics::Optimization and Control ,Integer lattice ,Combinatorial proof ,Céa's lemma ,Tucker's lemma ,Theoretical Computer Science ,Combinatorics ,Computational Theory and Mathematics ,Discrete Mathematics and Combinatorics ,business ,Subdivision ,Mathematics - Abstract
Tucker's lemma is a combinatorial result which may be used to derive several theorems in topology. Some basic properties are established for the cube of integer lattice points. Tucker's lemma is then proved by applying a result which was originally presented for the octahedral subdivision of the n -disk.
- Published
- 1970
- Full Text
- View/download PDF
44. From Field and Study
- Author
-
Robert C. Lasiewski, John Davis, James K. Baker, Martin L. Morton, Sanford D. Schemnitz, Ernest Ables, and Richard C. Banks
- Subjects
Chemical physics ,Energetics ,Animal Science and Zoology ,Biology ,Ecology, Evolution, Behavior and Systematics - Published
- 1962
- Full Text
- View/download PDF
45. The American Society of African Culture
- Author
-
James K. Baker
- Subjects
History ,Sociology and Political Science ,Anthropology ,African culture ,Geography, Planning and Development ,African studies - Abstract
It is supposed that most readers of The Journal of Modern African Studies know something of A.M.S.A.C., its origins, and the programmatic implementation of its purposes; if not, then certainly of its existence. Accordingly, only a summary recapitulation of the Society's history of origin will be made here.
- Published
- 1966
- Full Text
- View/download PDF
46. Mexican freetail bats: photography
- Author
-
Paul F. Spangle, Harold E. Edgerton, and James K. Baker
- Subjects
Multidisciplinary ,Optics ,business.industry ,Photography ,business ,Tail membrane ,Geology ,Remote sensing - Abstract
A method is described for photographing bats or other rapidly moving objects as they intercept in space a particular area which is covered by a camera system. Photographs taken at Carlsbad Caverns show that the tail membrane of the Mexican freetail bat is extended when the animal is in flight.
- Published
- 1966
47. Like It Is, May 12, 1991
- Author
-
B. Patrick Bauer; Louis Mahern; Robert D. Garton; Paul S. Mannweiler; William Styring III; Birch "Evan" E. Bayh III; Stephen "Steve" L. Goldsmith; Gabriel E. Aguirre; James K. Baker; Jerry T. Payne; Deborah J. Daniels; John "Jack" J. Thar; David H. McDougal; Richard M. Daley; John D. Tinder and B. Patrick Bauer; Louis Mahern; Robert D. Garton; Paul S. Mannweiler; William Styring III; Birch "Evan" E. Bayh III; Stephen "Steve" L. Goldsmith; Gabriel E. Aguirre; James K. Baker; Jerry T. Payne; Deborah J. Daniels; John "Jack" J. Thar; David H. McDougal; Richard M. Daley; John D. Tinder
- Abstract
This audio recording of WTLC's news radio show Like It Is details plans for the upcoming General Assembly special session on the state budget, interviews with mayoral candidates Stephen Goldsmith and Louis Mahern, reactions to a proposed expansion of trade with Latin America, the sentencing of a member of the Heilbrunn gang and other local and international drug stories, a conference of narcotics officers, and a pilot program to allow cameras in courtrooms.
48. Notes on the Myotis of the Carlsbad Caverns
- Author
-
James K. Baker
- Subjects
Geography ,Ecology ,Genetics ,Animal Science and Zoology ,Ecology, Evolution, Behavior and Systematics ,Nature and Landscape Conservation - Published
- 1962
- Full Text
- View/download PDF
49. Trainable grammars for speech recognition
- Author
-
James K. Baker
- Subjects
Acoustics and Ultrasonics ,Grammar ,Computer science ,media_common.quotation_subject ,Attribute grammar ,Speech recognition ,Link grammar ,Computer Science::Computation and Language (Computational Linguistics and Natural Language and Speech Processing) ,Ambiguity ,Mildly context-sensitive grammar formalism ,Context-free grammar ,Grammar induction ,TheoryofComputation_MATHEMATICALLOGICANDFORMALLANGUAGES ,Arts and Humanities (miscellaneous) ,Rule-based machine translation ,Ambiguous grammar ,Grammar-based code ,Affix grammar ,Stochastic grammar ,Stochastic context-free grammar ,Inside–outside algorithm ,Hidden Markov model ,Natural language ,media_common - Abstract
Algorithms which are based on modeling speech as a finite‐state, hidden Markov process have been very successful in recent years. This paper presents a generalization of these algorithms to certain denumerable‐state, hidden Markov processes. This algorithm permits automatic training of the stochastic analog of an arbitrary context free grammar. In particular, in contrast to many grammatical inference methods, the new algorithm allows the grammar to have an arbitrary degree of ambiguity. Since natural language is often syntactically ambiguous, it is necessary for the grammatical inference algorithm to allow for this ambiguity. Furthermore, allowing ambiguity in the grammar allows errors in the recognition process to be explicitly modeled in the grammar rather than added as an extra component.
- Published
- 1979
- Full Text
- View/download PDF
50. Parallel pattern verifier with dynamic time warping
- Author
-
Janet M. Baker and James K. Baker
- Subjects
Dynamic time warping ,Acoustics and Ultrasonics ,Arts and Humanities (miscellaneous) ,Matching (graph theory) ,Computer Science::Sound ,Computer science ,Node (networking) ,Path (graph theory) ,Accumulator (computing) ,Algorithm ,Word (computer architecture) ,Event (probability theory) - Abstract
A speech recognition system is disclosed which employs a network of elementary local decision modules for matching an observed time-varying speech pattern against all possible time warpings of the stored prototype patterns. For each elementary speech segment, an elementary recognizer provides a score indicating the degree of correlation of the input speech segment with stored spectral patterns. Each local decision module receives the results of the elementary recognizer and, at the same time, receives an input from selected ones of the other local decision modules. Each local decision module specializes in a particular node in the network wherein each node matches the probability of how well the input segment of speech matches the particular sound segments in the sounds of the words spoken. Each local decision module takes the prior decisions of all preceding sound segments which are input from the other local decision modules and makes a selection of the locally optimum time warping to be permitted. By this selection technique, each speech segment is stretched or compressed by an arbitrary, nonlinear function based on the control of the interconnections of the other local decision modules to a particular local decision module. Each local decision module includes an accumulator memory which stores the logarithmic probabilities of the current observation which is conditional upon the internal event specified by a word to be matched or identifier of the particular pattern that corresponds to the subject node for that particular pattern. For each observation, these probabilities are computed and loaded into the accumulator memory of all the modules and, the result of the locally optimum time warping representing the accumulated score or network path to a node for the word with the highest probability is chosen.
- Published
- 1987
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.