Re-structuring, Re-labeling, and Re-aligning for Syntax-Based Machine Translation.

Authors :: Wei Wang
May, Jonathan
Knight, Kevin
Marcu, Daniel
Source :: Computational Linguistics. Jun2010, Vol. 36 Issue 2, p247-277. 31p. 14 Diagrams, 7 Charts.
Publication Year :: 2010
Abstract: This article shows that the structure of bilingual material from standard parsing and alignment tools is not optimal for training syntax-based statistical machine translation (SMT) systems. We present three modifications to the MT training data to improve the accuracy of a state-of-the-art syntax MT system: re-structuring changes the syntactic structure of training parse trees to enable reuse of substructures; re-labeling alters bracket labels to enrich rule application context; and re-aligning unifies word alignment across sentences to remove bad word alignments and refine good ones. Better structures, labels, and word alignments are learned by the EM algorithm. We show that each individual technique leads to improvement as measured by BLEU, and we also show that the greatest improvement is achieved by combining them. We report an overall 1.48 BLEU improvement on the NIST08 evaluation set over a strong baseline in Chinese/English translation. [ABSTRACT FROM AUTHOR]

Subjects :: *PARSING (Grammar)
*SYNTAX (Grammar)
*MACHINE translating
*ARTIFICIAL intelligence
*COMPUTATIONAL linguistics
*APPLIED linguistics

Full Text Access

Tools