Back to Search
Start Over
A Study of Statistical Machine Translation Methods for Under Resourced Languages
- Source :
- SLTU
- Publication Year :
- 2016
- Publisher :
- Elsevier BV, 2016.
-
Abstract
- This paper contributes an empirical study of the application of five state-of-the-art machine translation to the trans- lation of low-resource languages. The methods studied were phrase-based, hierarchical phrase-based, the operational sequence model, string-to-tree, tree-to-string statistical machine translation methods between English (en) and the under resourced languages Lao (la), Myanmar (mm), Thai (th) in both directions. The performance of the machine translation systems was automatically measured in terms of BLEU and RIBES for all experiments. Our main findings were that the phrase-based SMT method generally gave the highest BLEU scores. This was counter to expectations, and we believe indicates that this method may be more robust to limitations on the data set size. However, when evaluated with RIBES, the best scores came from methods other than phrase-based SMT, indicating that the other methods were able to handle the word re-ordering better even under the constraint of limited data. Our study achieved the highest reported results on the data sets for all translation language pairs.
- Subjects :
- Phrase
Machine translation
Translation language
Computer science
Speech recognition
02 engineering and technology
computer.software_genre
Machine translation software usability
Example-based machine translation
03 medical and health sciences
0302 clinical medicine
Rule-based machine translation
0202 electrical engineering, electronic engineering, information engineering
Evaluation of machine translation
Operation Sequence Model
Syntax-based
General Environmental Science
BLEU
business.industry
Phrase-based
Hierarchical Phrase-based
030221 ophthalmology & optometry
General Earth and Planetary Sciences
020201 artificial intelligence & image processing
Synchronous context-free grammar
Artificial intelligence
business
computer
Under resourced languages
Word (computer architecture)
Natural language processing
Subjects
Details
- ISSN :
- 18770509
- Volume :
- 81
- Database :
- OpenAIRE
- Journal :
- Procedia Computer Science
- Accession number :
- edsair.doi.dedup.....75cd8d03eae236410f5a8fd7e0df602f
- Full Text :
- https://doi.org/10.1016/j.procs.2016.04.057