Back to Search
Start Over
Sinhala Transliteration: A Comparative Analysis Between Rule-based and Seq2Seq Approaches
- Publication Year :
- 2024
-
Abstract
- Due to reasons of convenience and lack of tech literacy, transliteration (i.e., Romanizing native scripts instead of using localization tools) is eminently prevalent in the context of low-resource languages such as Sinhala, which have their own writing script. In this study, our focus is on Romanized Sinhala transliteration. We propose two methods to address this problem: Our baseline is a rule-based method, which is then compared against our second method where we approach the transliteration problem as a sequence-to-sequence task akin to the established Neural Machine Translation (NMT) task. For the latter, we propose a Transformer-based Encode-Decoder solution. We witnessed that the Transformer-based method could grab many ad-hoc patterns within the Romanized scripts compared to the rule-based method. The code base associated with this paper is available on GitHub - https://github.com/kasunw22/Sinhala-Transliterator/<br />Comment: 8 pages, 7 tables
- Subjects :
- Computer Science - Computation and Language
F.2.2, I.2.7 68T50
Subjects
Details
- Database :
- arXiv
- Publication Type :
- Report
- Accession number :
- edsarx.2501.00529
- Document Type :
- Working Paper