Back to Search Start Over

An Extremely Large Vocabulary Approach to Named Entity Extraction from Speech

Authors :
Atsushi Nakamura
Takaaki Hori
Source :
ICASSP (1)
Publication Year :
2006
Publisher :
IEEE, 2006.

Abstract

This paper describes an approach to Named Entity (NE) extraction from speech data, in which an extremely large vocabulary lexicon including all NEs occurring in a large text corpus is used for Automatic Speech Recognition (ASR). Accordingly, NEs appear in the recognition results just as they are. Our approach is implemented by the following steps: (1) run an NE-tagger for a whole text corpus and make an NE-tagged corpus in which each NE is padded with its category, (2) construct a lexicon and a language model for ASR using the tagged corpus where each NE is considered as a regular word, and (3) run the speech recognizer in one pass. Although a very large vocabulary is necessary to ensure a high coverage of NEs, that is no longer a major problem since we recently achieved real-time extremely large vocabulary ASR using a WFST framework. In experiments on NE extraction from spoken queries for an open-domain question-answering system, our approach yielded higher F-measure values than a conventional approach.

Details

Database :
OpenAIRE
Journal :
2006 IEEE International Conference on Acoustics Speed and Signal Processing Proceedings
Accession number :
edsair.doi...........1989b27b74f66476da19c13542f0b727
Full Text :
https://doi.org/10.1109/icassp.2006.1660185