Back to Search
Start Over
Ancient-Modern Chinese Translation with a Large Training Dataset
- Publication Year :
- 2018
-
Abstract
- Ancient Chinese brings the wisdom and spirit culture of the Chinese nation. Automatic translation from ancient Chinese to modern Chinese helps to inherit and carry forward the quintessence of the ancients. However, the lack of large-scale parallel corpus limits the study of machine translation in Ancient-Modern Chinese. In this paper, we propose an Ancient-Modern Chinese clause alignment approach based on the characteristics of these two languages. This method combines both lexical-based information and statistical-based information, which achieves 94.2 F1-score on our manual annotation Test set. We use this method to create a new large-scale Ancient-Modern Chinese parallel corpus which contains 1.24M bilingual pairs. To our best knowledge, this is the first large high-quality Ancient-Modern Chinese dataset. Furthermore, we analyzed and compared the performance of the SMT and various NMT models on this dataset and provided a strong baseline for this task.<br />To appear in the ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP)
- Subjects :
- FOS: Computer and information sciences
Computer Science - Computation and Language
General Computer Science
Machine translation
business.industry
Computer science
Automatic translation
02 engineering and technology
010501 environmental sciences
computer.software_genre
Translation (geometry)
01 natural sciences
Task (project management)
Manual annotation
Test set
0202 electrical engineering, electronic engineering, information engineering
020201 artificial intelligence & image processing
Artificial intelligence
business
Baseline (configuration management)
Computation and Language (cs.CL)
computer
Natural language processing
0105 earth and related environmental sciences
Subjects
Details
- Language :
- English
- Database :
- OpenAIRE
- Accession number :
- edsair.doi.dedup.....7a065907bc9531c748072e57cb53dd3b