1. Sinhala Transliteration: A Comparative Analysis Between Rule-based and Seq2Seq Approaches
- Author
-
De Mel, Yomal, Wickramasinghe, Kasun, de Silva, Nisansa, and Ranathunga, Surangika
- Subjects
Computer Science - Computation and Language ,F.2.2, I.2.7 68T50 - Abstract
Due to reasons of convenience and lack of tech literacy, transliteration (i.e., Romanizing native scripts instead of using localization tools) is eminently prevalent in the context of low-resource languages such as Sinhala, which have their own writing script. In this study, our focus is on Romanized Sinhala transliteration. We propose two methods to address this problem: Our baseline is a rule-based method, which is then compared against our second method where we approach the transliteration problem as a sequence-to-sequence task akin to the established Neural Machine Translation (NMT) task. For the latter, we propose a Transformer-based Encode-Decoder solution. We witnessed that the Transformer-based method could grab many ad-hoc patterns within the Romanized scripts compared to the rule-based method. The code base associated with this paper is available on GitHub - https://github.com/kasunw22/Sinhala-Transliterator/, Comment: 8 pages, 7 tables
- Published
- 2024