Back to Search
Start Over
Development of "kata terikat" detection and writing errors correction using Rabin-Karp and random forest algorithm.
- Source :
-
AIP Conference Proceedings . 2024, Vol. 3220 Issue 1, p1-15. 15p. - Publication Year :
- 2024
-
Abstract
- "Kata terikat" represent a commonly utilized word class in Indonesian journalistic articles, yet their usage is often erroneous. "Kata terikat" are categorized into three types based on their division: words that are connected, separated by a space, and divided by a hyphen (-). This presents an opportunity for the automation of detection and error checking of "kata terikat" usage. The Rabin-Karp algorithm is employed for the detection of "kata terikat" due to their varied patterns, and the Random Forest algorithm is applied for the classification and correction of incorrectly used "kata terikat". The dataset used for this research is accommodated by Tribun News in form of nearly 1000 samples of journalistic article. The research conducted reveals that the "kata terikat" correction model achieved an accuracy of 86.24%. Three rounds of testing were carried out using 10, 20, and 40 journalistic articles from the Tribun News dataset, yielding accuracies of 85.71%, 91.67%, and 86.67%, respectively. [ABSTRACT FROM AUTHOR]
- Subjects :
- *RANDOM forest algorithms
*AUTOMATION
*ALGORITHMS
*CLASSIFICATION
Subjects
Details
- Language :
- English
- ISSN :
- 0094243X
- Volume :
- 3220
- Issue :
- 1
- Database :
- Academic Search Index
- Journal :
- AIP Conference Proceedings
- Publication Type :
- Conference
- Accession number :
- 180170113
- Full Text :
- https://doi.org/10.1063/5.0235496