Back to Search Start Over

Re-structuring, Re-labeling, and Re-aligning for Syntax-Based Machine Translation.

Authors :
Wei Wang
May, Jonathan
Knight, Kevin
Marcu, Daniel
Source :
Computational Linguistics. Jun2010, Vol. 36 Issue 2, p247-277. 31p. 14 Diagrams, 7 Charts.
Publication Year :
2010

Abstract

This article shows that the structure of bilingual material from standard parsing and alignment tools is not optimal for training syntax-based statistical machine translation (SMT) systems. We present three modifications to the MT training data to improve the accuracy of a state-of-the-art syntax MT system: re-structuring changes the syntactic structure of training parse trees to enable reuse of substructures; re-labeling alters bracket labels to enrich rule application context; and re-aligning unifies word alignment across sentences to remove bad word alignments and refine good ones. Better structures, labels, and word alignments are learned by the EM algorithm. We show that each individual technique leads to improvement as measured by BLEU, and we also show that the greatest improvement is achieved by combining them. We report an overall 1.48 BLEU improvement on the NIST08 evaluation set over a strong baseline in Chinese/English translation. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
08912017
Volume :
36
Issue :
2
Database :
Academic Search Index
Journal :
Computational Linguistics
Publication Type :
Academic Journal
Accession number :
50658893
Full Text :
https://doi.org/10.1162/coli.2010.36.2.09054