Back to Search
Start Over
An Extremely Large Vocabulary Approach to Named Entity Extraction from Speech
- Source :
- ICASSP (1)
- Publication Year :
- 2006
- Publisher :
- IEEE, 2006.
-
Abstract
- This paper describes an approach to Named Entity (NE) extraction from speech data, in which an extremely large vocabulary lexicon including all NEs occurring in a large text corpus is used for Automatic Speech Recognition (ASR). Accordingly, NEs appear in the recognition results just as they are. Our approach is implemented by the following steps: (1) run an NE-tagger for a whole text corpus and make an NE-tagged corpus in which each NE is padded with its category, (2) construct a lexicon and a language model for ASR using the tagged corpus where each NE is considered as a regular word, and (3) run the speech recognizer in one pass. Although a very large vocabulary is necessary to ensure a high coverage of NEs, that is no longer a major problem since we recently achieved real-time extremely large vocabulary ASR using a WFST framework. In experiments on NE extraction from spoken queries for an open-domain question-answering system, our approach yielded higher F-measure values than a conventional approach.
- Subjects :
- Text corpus
Vocabulary
Computer science
business.industry
Speech recognition
media_common.quotation_subject
InformationSystems_INFORMATIONSTORAGEANDRETRIEVAL
Feature extraction
Construct (python library)
Speech processing
computer.software_genre
Lexicon
Language model
Artificial intelligence
business
computer
Natural language
Natural language processing
media_common
Subjects
Details
- Database :
- OpenAIRE
- Journal :
- 2006 IEEE International Conference on Acoustics Speed and Signal Processing Proceedings
- Accession number :
- edsair.doi...........1989b27b74f66476da19c13542f0b727
- Full Text :
- https://doi.org/10.1109/icassp.2006.1660185