Author: "Choi, Ikkyu" / Database: ERIC - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Choi, Ikkyu"' showing total 13 results

Start Over Author "Choi, Ikkyu" Database ERIC

13 results on '"Choi, Ikkyu"'

1. The Impact of Using Synthetically Generated Listening Stimuli on Test-Taker Performance: A Case Study with Multiple-Choice, Single-Selection Items. TOEFL® Research Reports. RR-98. ETS?RR-22-05

Author: Choi, Ikkyu and Zu, Jiyun
Abstract: Synthetically generated speech (SGS) has become an integral part of our oral communication in a wide variety of contexts. It can be generated instantly at a low cost and allows precise control over multiple aspects of output, all of which can be highly appealing to second language (L2) assessment developers who have traditionally relied upon human voice actors for recording audio materials. Nevertheless, SGS is not widely used in L2 assessments. One major concern in this use case lies in its potential impact on test-taker performance: Would the use of SGS (as opposed to using human voice actor recordings) change how test takers respond to an item? In this study, we investigated using SGS as stimuli for English L2 listening assessment items on test-taker performance. The data came from a pilot administration of multiple new task types and included 653 test takers' responses to two versions of the same 13 items, differing only in terms of their listening stimuli: a version using human voice actor recordings and the other version with SGS files. Multifaceted comparisons between test takers' responses across the two versions showed that the two versions elicited remarkably comparable performance. The comparability provides strong empirical evidence for the use of SGS as a viable alternative for human voice actor recordings in the immediate domain of L2 assessment as well as related domains such as learning material and research instrument development.
Published: 2022

2. Benchmark Keystroke Biometrics Accuracy from High-Stakes Writing Tasks. Research Report. ETS RR-21-15

Author: Choi, Ikkyu, Hao, Jiangang, Deane, Paul, and Zhang, Mo
Abstract: "Biometrics" are physical or behavioral human characteristics that can be used to identify a person. It is widely known that keystroke or typing dynamics for short, fixed texts (e.g., passwords) could serve as a behavioral biometric. In this study, we investigate whether keystroke data from essay responses can lead to a reliable biometric measure, with implications for test security and the monitoring of writing fluency and style changes. Based on keystroke data collected from a high-stakes writing testing setting, we established a preliminary biometric benchmark for detecting test-taker identity by using features extracted from their writing process logs. We report a benchmark keystroke biometric accuracy of equal error rate of 4.7% for identifying same versus different individuals on an essay task. In particular, we show that the inclusion of writing process features (e.g., features designed to describe the writing process) in addition to the widely used typing-timing features (e.g., features based on the time intervals between two-letter key sequences) improves the accuracy of the keystroke biometrics. The proposed keystroke biometrics can have important implications for the writing assessments administered through the remotely proctored tests that have been widely adopted during the COVID pandemic.
Published: 2021

3. Using Linkage Sets to Improve Connectedness in Rater Response Model Estimation

Author: Casabianca, Jodi M., Donoghue, John R., Shin, Hyo Jeong, Chao, Szu-Fu, and Choi, Ikkyu
Abstract: Using item-response theory to model rater effects provides an alternative solution for rater monitoring and diagnosis, compared to using standard performance metrics. In order to fit such models, the ratings data must be sufficiently connected in order to estimate rater effects. Due to popular rating designs used in large-scale testing scenarios, there tends to be a large proportion of missing data, yielding sparse matrices and estimation issues. In this article, we explore the impact of different types of connectedness, or linkage, brought about by using a linkage set--a collection of responses scored by most or all raters. We also explore the impact of the properties and composition of the linkage set, the different connectedness yielded from different rating designs, and the role of scores from automated scoring engines. In designing monitoring systems using the rater response version of the generalized partial credit model, the study results suggest use of a linkage set, especially a large one that is comprised of responses representing the full score scale. Results also show that a double-human-scoring design provides more connectedness than a design with one human and an automated scoring engine. Furthermore, scores from automated scoring engines do not provide adequate connectedness. We discuss considerations for operational implementation and further study.
Published: 2023
Full Text: View/download PDF

4. Evaluating Writing Process Features in an Adult EFL Writing Assessment Context: A Keystroke Logging Study

Author: Choi, Ikkyu and Deane, Paul
Abstract: Keystroke logs provide a comprehensive record of observable writing processes. Previous studies examining the keystroke logs of young L1 English writers performing experimental writing tasks have identified writing processes features predictive of the quality of responses. Contrarily, large-scale studies on the dynamic and temporal nature of L2 writing process are scarce, especially in an assessment setting. This study utilized the keystroke logs of adult English as a foreign language (EFL) learners responding to assessment tasks to examine the usefulness of the process features in this new context. We evaluated the features in terms of stability, explored factor structures for their correlations, and constructed models to predict response quality. The results showed that most of the process features were stable and that their correlations could be efficiently represented with a five-factor structure. Moreover, we observed improved response quality prediction over a baseline by up to 48%. These findings have implications for the evaluation and understanding of writing process features and for the substantive understanding of writing processes under assessment conditions.
Published: 2021
Full Text: View/download PDF

5. Evaluating Subscore Uses across Multiple Levels: A Case of Reading and Listening Subscores for Young EFL Learners

Author: Choi, Ikkyu and Papageorgiou, Spiros
Abstract: Stakeholders of language tests are often interested in subscores. However, reporting a subscore is not always justified; a subscore should provide reliable and distinct information to be worth reporting. When a subscore is used for decisions across multiple levels (e.g., individual test takers and schools), it needs to be justified for its reliability and distinctiveness at every relevant level. In this study, we examined whether reporting seven Reading and Listening subscores of the "TOEFL Primary"® test, a standardized English proficiency test for young English as a foreign language learners, could be justified for reporting at individual and school levels. We analyzed data collected in pilot administrations, in which 4776 students from 51 schools participated. We employed the classical test theory (CTT) based approaches of Haberman (2008) and Haberman, Sinharay, and Puhan (2009) for the individual and school-level investigations, respectively. We also supplemented the CTT-based approaches with a factor analytic approach for the individual level analysis and a multilevel modeling approach for the school-level analysis. The results differed across the two levels: we found little support for reporting the subscores at the individual level, but strong evidence supporting the added-value of the school-level subscores when the sample size for each school exceeds 50.
Published: 2020
Full Text: View/download PDF

6. The Impact of Operational Scoring Experience and Additional Mentored Training on Raters' Essay Scoring Accuracy

Author: Choi, Ikkyu and Wolfe, Edward W.
Abstract: Rater training is essential in ensuring the quality of constructed response scoring. Most of the current knowledge about rater training comes from experimental contexts with an emphasis on short-term effects. Few sources are available for empirical evidence on whether and how raters become more accurate as they gain scoring experiences or what long-term effects training can have. In this study, we addressed this research gap by tracking how the accuracies of new raters change through experience and by examining the impact of an additional training session on their accuracies in scoring calibration and monitoring essays. We found that, on average, raters' accuracy improved with scoring experience and that individual raters differed in their accuracy trajectories. The estimated average effect of the training was an approximately six percent increase in the calibration essay accuracy. On the other hand, we observed a smaller impact on the monitoring essay accuracy. Our follow-up analysis showed that this differential impact of the additional training on the calibration and monitoring essay accuracy could be accounted for by successful gatekeeping through calibration.
Published: 2020
Full Text: View/download PDF

7. Item Response Models for Multiple Attempts with Incomplete Data

Author: Bergner, Yoav, Choi, Ikkyu, and Castellano, Katherine E.
Abstract: Allowance for multiple chances to answer constructed response questions is a prevalent feature in computer-based homework and exams. We consider the use of item response theory in the estimation of item characteristics and student ability when multiple attempts are allowed but no explicit penalty is deducted for extra tries. This is common practice in online formative assessments, where the number of attempts is often unlimited. In these environments, some students may not always answer-until-correct, but may rather terminate a response process after one or more incorrect tries. We contrast the cases of graded and sequential item response models, both unidimensional models which do not explicitly account for factors other than ability. These approaches differ not only in terms of log-odds assumptions but, importantly, in terms of handling incomplete data. We explore the consequences of model misspecification through a simulation study and with four online homework data sets. Our results suggest that model selection is insensitive for complete data, but quite sensitive to whether missing responses are regarded as informative (of inability) or not (e.g., missing at random). Under realistic conditions, a sequential model with similar parametric degrees of freedom to a graded model can account for more response patterns and outperforms the latter in terms of model fit.
Published: 2019
Full Text: View/download PDF

8. Investigating the Benefits of Scaffolding in Assessments of Young English Learners: A Case for Scaffolded Retell Tasks

Author: Choi, Ikkyu, Wolf, Mikyung Kim, Pooler, Emilie, Sova, Lorraine, and Faulkner-Bond, Molly
Abstract: Scaffolding, an instructional strategy for providing support until learners can perform a task on their own, holds a potential to improve assessments for test takers who need support to demonstrate their abilities, such as English learners (ELs). In this study, we evaluated the benefits of including scaffolded tasks in standardized speaking assessments for young ELs. Our focus was on scaffolded retell tasks that require test takers to retell a given story with procedural scaffolds. We collected responses from 233 third grade ELs to two scaffolded retell tasks, and investigated the relationship between the pre- and post-scaffolding retell performances to learn the impact of the scaffolding on the post-scaffolding performance. We also examined the psychometric added-value of the scaffolding steps and post-scaffolding responses through an application of item response theory (IRT) measurement models. The findings showed that, conditional on the pre-scaffolding performance, the EL test takers' performance on the scaffolding steps were positively associated with their post-scaffolding retell performance. Moreover, the responses to the scaffolding steps and the post-scaffolding retell provided additional information about test takers' oral English proficiency. These findings provide empirical support for the benefits of using scaffolding for EL assessments in a standardized setting.
Published: 2019
Full Text: View/download PDF

9. Adding Value to Second-Language Listening and Reading Subscores: Using a Score Augmentation Approach

Author: Papageorgiou, Spiros and Choi, Ikkyu
Abstract: This study examined whether reporting subscores for groups of items within a test section assessing a second-language modality (specifically reading or listening comprehension) added value from a measurement perspective to the information already provided by the section scores. We analyzed the responses of 116,489 test takers to reading and listening items from operational administrations of two large-scale international tests of English as a foreign language. To "strengthen" the reliability of the subscores, and thus improve their added value, we applied a score augmentation method (Haberman, 2008). In doing so, our aim was to examine whether reporting augmented subscores for specific groups of reading and listening items could improve the added value of these subscores and consequently justify providing more fine-grained information about test taker performance. Our analysis indicated that in general, there was lack of support for reporting subscores from a psychometric perspective, and that score augmentation marginally improved the added value of the subscores. We discuss several implications of our findings for test developers wishing to report more fine-grained information about test performance. We conclude by arguing that research on how to best report such refined feedback should remain the focus of future efforts related to second-language proficiency tests.
Published: 2018
Full Text: View/download PDF

10. Exploring the Validity of a Second Language Intercultural Pragmatics Assessment Tool

Author: Timpe-Laughlin, Veronika and Choi, Ikkyu
Abstract: Pragmatics has been a key component of language competence frameworks. While the majority of second/foreign language (L2) pragmatics tests have targeted productive skills, the assessment of receptive pragmatic skills remains a developing field. This study explores validation evidence for a test of receptive L2 pragmatic ability called the American English Sociopragmatic Comprehension Test (AESCT), which is a Web-based assessment consisting of 54 tasks measuring knowledge of speech acts, routine formulae, and culture-dependent lexical differences. The AESCT is intended to be used as a learning-oriented assessment in university-level applied linguistics classes. This study collected evidence on construct validity supporting the AESCT design as a measure of pragmatic comprehension and for providing feedback in low-stakes learning contexts; 97 university-level English language learners took the AESCT along with the Cambridge Placement Test and a background questionnaire on their exposure to target language input. Descriptive statistics, correlation analyses, and linear regression were used to analyze aspects of construct validity. Results indicate that the AESCT is sufficiently reliable. Overall, learners were found to perform as previous research suggests: sociopragmatic knowledge was related to L2 exposure and L2 proficiency. Alongside future research into L2 pragmatics test validation, implications for instruction-oriented utilization of the AESCT are discussed.
Published: 2017
Full Text: View/download PDF

11. Empirical Profiles of Academic Oral English Proficiency from an International Teaching Assistant Screening Test

Author: Choi, Ikkyu
Abstract: Language proficiency constitutes a crucial barrier for prospective international teaching assistants (ITAs). Many US universities administer screening tests to ensure that ITAs possess the required academic oral English proficiency for their TA duties. Such ITA screening tests often elicit a sample of spoken English, which is evaluated in terms of multiple aspects by trained raters. In this light, ITA screening tests provide an advantageous context in which to gather rich information about test taker performances. This study introduces a systematic way of extracting meaningful information for major stakeholders from an ITA screening test administered at a US university. In particular, this study illustrates how academic oral English proficiency profiles can be identified based on test takers' subscale score patterns, and discusses how the resulting profiles can be used as feedback for ITA training and screening policy makers, the ITA training program of the university, ESL instructors, and test takers. The proficiency profiles were identified using finite mixture modeling based on the subscale scores of 960 test takers. The modeling results suggested seven profile groups. These groups were interpreted and labeled based on the characteristic subscale score patterns of their members. The implications of the results are discussed, with the main focus on how such information can help ITA policy makers, the ITA training program, ESL instructors, and test takers make important decisions.
Published: 2017
Full Text: View/download PDF

12. Structural Equation Modeling Reporting Practices for Language Assessment

Author: Ockey, Gary J. and Choi, Ikkyu
Abstract: Studies that use structural equation modeling (SEM) techniques are increasingly encountered in the language assessment literature. This popularity has created the need for a set of guidelines that can indicate what should be included in a research report and make it possible for research consumers to judge the appropriateness of the interpretations made from a reported study. This article attempts to fill this void by providing a set of reporting guidelines appropriate for language assessment researchers.
Published: 2015
Full Text: View/download PDF

13. Review of Pearson Test of English Academic: Building an Assessment Use Argument

Author: Wang, Huan, Choi, Ikkyu, and Schmidgall, Jonathan
Abstract: This review departs from current practice in reviewing tests in that it employs an "argument-based approach" to test validation to guide the review (e.g. Bachman, 2005; Kane, 2006; Mislevy, Steinberg, & Almond, 2002). Specifically, it follows an approach to test development and use that Bachman and Palmer (2010) call the process of "assessment justification". This process focuses on investigating the extent to which the intended uses of a particular test can be justified to stakeholders. The justification process includes two interrelated activities. The first is articulating an assessment use argument (AUA; Bachman, 2003, 2005; Bachman & Palmer, 2010), which makes explicit claims that link test takers' performance to the consequences of test use. The second activity consists of collecting evidence to support the claims that are articulated in the AUA. This review aims to examine the evidence that is provided by the test developer about the intended uses of the test so that readers and test users will be better informed in making their own judgments. (Contains 5 notes.)
Published: 2012
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

13 results on '"Choi, Ikkyu"'

1. The Impact of Using Synthetically Generated Listening Stimuli on Test-Taker Performance: A Case Study with Multiple-Choice, Single-Selection Items. TOEFL® Research Reports. RR-98. ETS?RR-22-05

2. Benchmark Keystroke Biometrics Accuracy from High-Stakes Writing Tasks. Research Report. ETS RR-21-15

3. Using Linkage Sets to Improve Connectedness in Rater Response Model Estimation

4. Evaluating Writing Process Features in an Adult EFL Writing Assessment Context: A Keystroke Logging Study

5. Evaluating Subscore Uses across Multiple Levels: A Case of Reading and Listening Subscores for Young EFL Learners

6. The Impact of Operational Scoring Experience and Additional Mentored Training on Raters' Essay Scoring Accuracy

7. Item Response Models for Multiple Attempts with Incomplete Data

8. Investigating the Benefits of Scaffolding in Assessments of Young English Learners: A Case for Scaffolded Retell Tasks

9. Adding Value to Second-Language Listening and Reading Subscores: Using a Score Augmentation Approach

10. Exploring the Validity of a Second Language Intercultural Pragmatics Assessment Tool

11. Empirical Profiles of Academic Oral English Proficiency from an International Teaching Assistant Screening Test

12. Structural Equation Modeling Reporting Practices for Language Assessment

13. Review of Pearson Test of English Academic: Building an Assessment Use Argument

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Region

Database

13 results on '"Choi, Ikkyu"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources