Back to Search
Start Over
Divergence-based fine pruning of phrase-based statistical translation model
- Source :
- Computer Speech & Language. 41:146-160
- Publication Year :
- 2017
- Publisher :
- Elsevier BV, 2017.
-
Abstract
- Entropy-based pruning has a limit in selecting a fine distribution of phrase pairs to be pruned in a threshold.Changing the distribution through other divergence metrics improves pruning efficiency in our preliminary empirical analysis.Derived problematic factors are fixed divergence distribution and missing impact of word-coupling strength.We propose a fine pruning method using two parameters to control the factors and analyze their effects to divergence change.It improves pruning efficiency compared with Entropy-based pruning in practical translations of English, Spanish, and French. A widely used automatic translation approach, phrase-based statistical machine translation, learns a probabilistic translation model composed of phrases from a large parallel corpus with a large language model. The translation model is often enormous because of many combinations of source and target phrases, which leads to the restriction of applications to limited computing environments. Entropy-based pruning resolves this issue by reducing the model size while retaining the translation quality. To safely reduce the size, this method detects redundant components by evaluating a relative entropy of models before and after pruning the components. In the literature, this method is effective, but we have observed that it can be improved more by adjusting the divergence distribution determined by the relative entropy. In the results of preliminary experiments, we derive two factors responsible for limiting pruning efficiency of entropy-based pruning. The first factor is proportion of pairs composing translation models with respect to their translation probability and its estimate. The second factor is the exponential increase of the divergence for pairs with low translation probability and estimate. To control the factors, we propose a divergence-based fine pruning using a divergence metric to adapt the curvature change of the boundary conditions for pruning and Laplace smoothing. In practical translation tasks for English-Spanish and English-French language pairs, this method shows statistically significant improvement on the efficiency up to 50% and average 12% more pruning compared to entropy-based pruning to show the same translation quality.
- Subjects :
- Kullback–Leibler divergence
Machine translation
business.industry
Probabilistic logic
Computer Science::Computation and Language (Computational Linguistics and Natural Language and Speech Processing)
020206 networking & telecommunications
Pattern recognition
02 engineering and technology
computer.software_genre
Theoretical Computer Science
Human-Computer Interaction
Principal variation search
020204 information systems
0202 electrical engineering, electronic engineering, information engineering
Entropy (information theory)
Artificial intelligence
Language model
business
Additive smoothing
computer
Software
Killer heuristic
Mathematics
Subjects
Details
- ISSN :
- 08852308
- Volume :
- 41
- Database :
- OpenAIRE
- Journal :
- Computer Speech & Language
- Accession number :
- edsair.doi...........af2ccbccdbc8418d70bb84733edd78d0
- Full Text :
- https://doi.org/10.1016/j.csl.2016.06.006