Back to Search
Start Over
Semi-supervised Part-of-speech Tagging in Speech Applications
- Source :
- Interspeech, Tokyo (Japan), Interspeech, Tokyo (Japan), 2010, Unknown, Unknown Region, Interspeech 2010, Interspeech 2010, 2010, Makuhari, Japan, HAL, INTERSPEECH
- Publication Year :
- 2010
- Publisher :
- HAL CCSD, 2010.
-
Abstract
- When no training or adaptation data is available, semisupervised training is a good alternative for processing new domains. We perform Bayesian training of a part-of-speech (POS) tagger from unannotated text and a dictionary of possible tags for each word. We complement that method with supervised prediction of possible tags for out-of-vocabulary words and study the impact of both semi-supervision and starting dictionary size on three representative downstream tasks (named entity tagging, semantic role labeling, ASR output postprocessing) that use POS tags as features. The outcome is no impact or a small decrease in performance compared to using a fully supervised tagger, with even potential gains in case of domain mismatch for the supervised tagger. Tasks that trust the tags completely (like ASR post-processing) are more affected by a reduction of the starting dictionary, but still yield positive outcome.
- Subjects :
- Computer science
02 engineering and technology
computer.software_genre
[INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL]
Domain (software engineering)
Reduction (complexity)
030507 speech-language pathology & audiology
03 medical and health sciences
Semantic role labeling
0202 electrical engineering, electronic engineering, information engineering
[INFO]Computer Science [cs]
Adaptation (computer science)
ComputingMilieux_MISCELLANEOUS
Complement (set theory)
business.industry
Part-of-speech tagging
020206 networking & telecommunications
Speech corpus
Linguistics
Named entity
[INFO.INFO-CL] Computer Science [cs]/Computation and Language [cs.CL]
Artificial intelligence
0305 other medical science
business
computer
Natural language processing
Word (computer architecture)
Subjects
Details
- Language :
- English
- Database :
- OpenAIRE
- Journal :
- Interspeech, Tokyo (Japan), Interspeech, Tokyo (Japan), 2010, Unknown, Unknown Region, Interspeech 2010, Interspeech 2010, 2010, Makuhari, Japan, HAL, INTERSPEECH
- Accession number :
- edsair.doi.dedup.....f04f3e8617b47cd96cf518014d6aaf73