Back to Search Start Over

Semi-supervised Part-of-speech Tagging in Speech Applications

Authors :
Benoit Favre
Richard Dufour
Laboratoire Informatique d'Avignon (LIA)
Avignon Université (AU)-Centre d'Enseignement et de Recherche en Informatique - CERI
Laboratoire d'informatique Fondamentale de Marseille (LIF)
Aix Marseille Université (AMU)-École Centrale de Marseille (ECM)-Centre National de la Recherche Scientifique (CNRS)
Traitement Automatique du Langage Ecrit et Parlé (TALEP)
Laboratoire d'Informatique et Systèmes (LIS)
Aix Marseille Université (AMU)-Université de Toulon (UTLN)-Centre National de la Recherche Scientifique (CNRS)-Aix Marseille Université (AMU)-Université de Toulon (UTLN)-Centre National de la Recherche Scientifique (CNRS)
Favre, Benoit
Laboratoire d'Informatique de l'Université du Mans (LIUM)
Le Mans Université (UM)
Centre National de la Recherche Scientifique (CNRS)-École Centrale de Marseille (ECM)-Aix Marseille Université (AMU)
Source :
Interspeech, Tokyo (Japan), Interspeech, Tokyo (Japan), 2010, Unknown, Unknown Region, Interspeech 2010, Interspeech 2010, 2010, Makuhari, Japan, HAL, INTERSPEECH
Publication Year :
2010
Publisher :
HAL CCSD, 2010.

Abstract

When no training or adaptation data is available, semisupervised training is a good alternative for processing new domains. We perform Bayesian training of a part-of-speech (POS) tagger from unannotated text and a dictionary of possible tags for each word. We complement that method with supervised prediction of possible tags for out-of-vocabulary words and study the impact of both semi-supervision and starting dictionary size on three representative downstream tasks (named entity tagging, semantic role labeling, ASR output postprocessing) that use POS tags as features. The outcome is no impact or a small decrease in performance compared to using a fully supervised tagger, with even potential gains in case of domain mismatch for the supervised tagger. Tasks that trust the tags completely (like ASR post-processing) are more affected by a reduction of the starting dictionary, but still yield positive outcome.

Details

Language :
English
Database :
OpenAIRE
Journal :
Interspeech, Tokyo (Japan), Interspeech, Tokyo (Japan), 2010, Unknown, Unknown Region, Interspeech 2010, Interspeech 2010, 2010, Makuhari, Japan, HAL, INTERSPEECH
Accession number :
edsair.doi.dedup.....f04f3e8617b47cd96cf518014d6aaf73