Back to Search
Start Over
Diagnosing High-Quality Statistical Machine Translation Using Traces of Post-Edition Operations
- Source :
- LREC 2016 proceedings, International Conference on Language Resources and Evaluation-Workshop on Translation Evaluation: From Fragmented Tools and Data Sets to an Integrated Ecosystem (MT Eval 2016 2016), International Conference on Language Resources and Evaluation-Workshop on Translation Evaluation: From Fragmented Tools and Data Sets to an Integrated Ecosystem (MT Eval 2016 2016), 2016, Portorož, Slovenia. pp.8
- Publication Year :
- 2016
- Publisher :
- HAL CCSD, 2016.
-
Abstract
- International audience; This paper proposes a fine-grained flexible analysis methodology to reveal the residual difficulties of a high-quality Statistical Machine Translation (SMT) system. This proposal is motivated by the fact that the traditional automated metrics are not enough informative to indicate the nature and reasons of those residual difficulties. Their resolution is however a key point towards improving the high-quality output. The novelty of our approach consists in diagnosing Machine Translation (MT) performance by making a connection between errors, the characteristics of source sentences and some internal parameters of the system, using traces of Post-Edition (PE) operations as well as Quality Estimation (QE) techniques. Our methodology is illustrated on a SMT system adapted to the medical domain, based on a high quality English-French parallel corpus of Cochrane systematic review abstracts. Our experimental results show that the main difficulties that the system faces are in the domains of term precision and source language syntactic and stylistic peculiarities. We furthermore provide general information regarding the corpus structure and its specificities, including internal stylistic varieties characteristic of this sub-genre.
Details
- Language :
- English
- Database :
- OpenAIRE
- Journal :
- LREC 2016 proceedings, International Conference on Language Resources and Evaluation-Workshop on Translation Evaluation: From Fragmented Tools and Data Sets to an Integrated Ecosystem (MT Eval 2016 2016), International Conference on Language Resources and Evaluation-Workshop on Translation Evaluation: From Fragmented Tools and Data Sets to an Integrated Ecosystem (MT Eval 2016 2016), 2016, Portorož, Slovenia. pp.8
- Accession number :
- edsair.dedup.wf.001..e6b401a3e901489bdc1a66b79fff5445