Back to Search Start Over

Stemmer and phonotactic rules to improve n-gram tagger-based indonesian phonemicization

Authors :
Suyanto Suyanto
Andi Sunyoto
Rezza Nafi Ismail
Ema Rachmawati
Warih Maharani
Source :
Journal of King Saud University: Computer and Information Sciences, Vol 34, Iss 6, Pp 3807-3814 (2022)
Publication Year :
2022
Publisher :
Elsevier, 2022.

Abstract

A phonemicization or grapheme-to-phoneme conversion (G2P) is a process of converting a word into its pronunciation. It is one of the essential components in speech synthesis, speech recognition, and natural language processing. The deep learning (DL)-based state-of-the-art G2P model generally gives low phoneme error rate (PER) as well as word error rate (WER) for high-resource languages, such as English and European, but not for low-resource languages. Therefore, some conventional machine learning (ML)-based G2P models incorporated with specific linguistic knowledge are preferable for low-resource languages. However, these models are poor for several low-resource languages because of various issues. For instance, an Indonesian G2P model works well for roots but gives a high PER for derivatives. Most errors come from the ambiguities of some roots and derivative words containing four prefixes: 〈ber〉, 〈meng〉, 〈peng〉, and 〈ter〉. In this research, an Indonesian G2P model based on n-gram combined with stemmer and phonotactic rules (NGTSP) is proposed to solve those problems. An investigation based on 5-fold cross-validation, using 50 k Indonesian words, informs that the proposed NGTSP gives a much lower PER of 0.78% than the state-of-the-art Transformer-based G2P model (1.14%). Besides, it also provides a much faster processing time.

Details

Language :
English
ISSN :
13191578
Volume :
34
Issue :
6
Database :
Directory of Open Access Journals
Journal :
Journal of King Saud University: Computer and Information Sciences
Publication Type :
Academic Journal
Accession number :
edsdoj.7ccdae126bdf4a5d9743de41ecd8aa6c
Document Type :
article
Full Text :
https://doi.org/10.1016/j.jksuci.2021.01.006