Back to Search Start Over

Development of "kata terikat" detection and writing errors correction using Rabin-Karp and random forest algorithm.

Authors :
Siswanto, Vallencius Gavriel Alfredo
Overbeek, Marlinda Vasty
Source :
AIP Conference Proceedings. 2024, Vol. 3220 Issue 1, p1-15. 15p.
Publication Year :
2024

Abstract

"Kata terikat" represent a commonly utilized word class in Indonesian journalistic articles, yet their usage is often erroneous. "Kata terikat" are categorized into three types based on their division: words that are connected, separated by a space, and divided by a hyphen (-). This presents an opportunity for the automation of detection and error checking of "kata terikat" usage. The Rabin-Karp algorithm is employed for the detection of "kata terikat" due to their varied patterns, and the Random Forest algorithm is applied for the classification and correction of incorrectly used "kata terikat". The dataset used for this research is accommodated by Tribun News in form of nearly 1000 samples of journalistic article. The research conducted reveals that the "kata terikat" correction model achieved an accuracy of 86.24%. Three rounds of testing were carried out using 10, 20, and 40 journalistic articles from the Tribun News dataset, yielding accuracies of 85.71%, 91.67%, and 86.67%, respectively. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
0094243X
Volume :
3220
Issue :
1
Database :
Academic Search Index
Journal :
AIP Conference Proceedings
Publication Type :
Conference
Accession number :
180170113
Full Text :
https://doi.org/10.1063/5.0235496