1. What is the best predictor of word difficulty? A case of data mining using random forest.
- Author
-
Ha, Hung Tan, Nguyen, Duyen Thi Bich, and Stoeckel, Tim
- Subjects
- *
DATA mining , *RANDOM forest algorithms , *VOCABULARY , *EMPIRICAL research , *DATA analysis - Abstract
Word frequency has a long history of being considered the most important predictor of word difficulty and has served as a guideline for several aspects of second language vocabulary teaching, learning, and assessment. However, recent empirical research has challenged the supremacy of frequency as a predictor of word difficulty. Accordingly, applied linguists have questioned the use of frequency as the principal criterion in the development of wordlists and vocabulary tests. Despite being informative, previous studies on the topic have been limited in the way the researchers measured word difficulty and the statistical techniques they employed for exploratory data analysis. In the current study, meaning recall was used as a measure of word difficulty, and random forest was employed to examine the importance of various lexical sophistication metrics in predicting word difficulty. The results showed that frequency was not the most important predictor of word difficulty. Due to the limited scope, research findings are only generalizable to Vietnamese learners of English. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF