Back to Search Start Over

Threats to validity in the use and interpretation of script concordance test scores.

Authors :
Lineberry, Matthew
Kreiter, Clarence D
Bordage, Georges
Source :
Medical Education. Dec2013, Vol. 47 Issue 12, p1175-1183. 9p. 1 Chart, 2 Graphs.
Publication Year :
2013

Abstract

Context Recent reviews have claimed that the script concordance test ( SCT) methodology generally produces reliable and valid assessments of clinical reasoning and that the SCT may soon be suitable for high-stakes testing. Objectives This study is intended to describe three major threats to the validity of the SCT not yet considered in prior research and to illustrate the severity of these threats. Methods We conducted a review of SCT reports available through the Web of Science database. Additionally, we reanalysed scores from a previously published SCT administration to explore issues related to standard SCT scoring practice. Results Firstly, the predominant method for aggregate and partial credit scoring of SCTs introduces logical inconsistencies in the scoring key. Secondly, our literature review shows that SCT reliability studies have generally ignored inter-panel, inter-panellist and test-retest measurement error. Instead, studies have focused on observed levels of coefficient alpha, which is neither an informative index of internal structure nor a comprehensive index of reliability for SCT scores. As such, claims that SCT scores show acceptable reliability are premature. Finally, SCT criteria for item inclusion, in concert with a statistical artefact of the SCT format, cause anchors at the extremes of the scale to have less expected credit than anchors near or at the midpoint. Consequently, SCT scores are likely to reflect construct-irrelevant differences in examinees' response styles. This makes the test susceptible to bias against candidates who endorse extreme scale anchors more readily; it also makes two construct-irrelevant test taking strategies extremely effective. In our reanalysis, we found that examinees could drastically increase their scores by never endorsing extreme scale points. Furthermore, examinees who simply endorsed the scale midpoint for every item would still have outperformed most examinees who used the scale as it is intended. Conclusions Given the severity of these threats, we conclude that aggregate scoring of SCTs cannot be recommended. Recommendations for revisions of SCT methodology are discussed. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
03080110
Volume :
47
Issue :
12
Database :
Academic Search Index
Journal :
Medical Education
Publication Type :
Academic Journal
Accession number :
91898753
Full Text :
https://doi.org/10.1111/medu.12283