26 results on '"Sinharay, Sandip"'
Search Results
2. Assessing Fit of the Lognormal Model for Response Times
- Author
-
Sinharay, Sandip and van Rijn, Peter W.
- Abstract
Response time models (RTMs) are of increasing interest in educational and psychological testing. This article focuses on the lognormal model for response times, which is one of the most popular RTMs. Several existing statistics for testing normality and the fit of factor analysis models are repurposed for testing the fit of the lognormal model. A simulation study and two real data examples demonstrate the usefulness of the statistics. The Shapiro-Wilk test of normality and a "z"-test for factor analysis models were the most powerful in assessing the misfit of the lognormal model. [For the corresponding grantee submission, see ED603049.]
- Published
- 2020
- Full Text
- View/download PDF
3. Assessing Fit of the Lognormal Model for Response Times
- Author
-
Sinharay, Sandip and van Rijn, Peter
- Abstract
Response-time models are of increasing interest in educational and psychological testing. This paper focuses on the lognormal model for response times (van der Linden, 2006), which is one of the most popular response-time models. Several existing statistics for testing normality and the fit of factor-analysis models are repurposed for testing the fit of the lognormal model. A simulation study and two real data examples demonstrate the usefulness of the statistics. The Shapiro-Wilk test of normality (Shapiro & Wilk, 1965) and a Z-test for factor analysis models (Maydeu-Olivares, 2017) were the most powerful in assessing the misfit of the lognormal model. [This is the in-press version of an article published in "Journal of Educational and Behavioral Statistics."]
- Published
- 2020
- Full Text
- View/download PDF
4. A New Person-Fit Statistic for the Lognormal Model for Response Times
- Author
-
Sinharay, Sandip
- Abstract
Response-time models are of increasing interest in educational and psychological testing. This paper focuses on the lognormal model for response times (van der Linden, 2006), which is one of the most popular response-time models, and suggests a simple person-fit statistic for the model. The distribution of the statistic under the null hypothesis of no misfit is proved to be a chi-squared distribution. A simulation study and a real data example demonstrate the usefulness of the suggested statistic. [This paper was published in "Journal of Educational Measurement" v55 n4 p457-476 2018 (EJ1198338).]
- Published
- 2018
- Full Text
- View/download PDF
5. Extension of Caution Indices to Mixed-Format Tests
- Author
-
Sinharay, Sandip
- Abstract
Tatsuoka (1984) suggested several extended caution indices and their standardized versions that have been used as person-fit statistics by researchers such as Drasgow, Levine, and McLaughlin (1987), Glas and Meijer (2003), and Molenaar and Hoijtink (1990). However, these indices are only defined for tests with dichotomous items. This paper extends two of the popular standardized extended caution indices (Tatsuoka, 1984) for use with polytomous items and mixed-format tests. Two additional new person-fit statistics are obtained by applying the asymptotic standardization of person-fit statistics for mixed-format tests (Sinharay, 2016c). Detailed simulations are then performed to compute the Type I error rate and power of the four new person-fit statistics. Two real data illustrations follow. The new person-fit statistics appear to be satisfactory tools for assessing person fit for polytomous items and mixed-format tests. [This paper will be published in the "British Journal of Mathematical and Statistical Psychology."]
- Published
- 2018
- Full Text
- View/download PDF
6. Automated Trait Scores for 'GRE'® Writing Tasks. Research Report. ETS RR-15-15
- Author
-
Attali, Yigal and Sinharay, Sandip
- Abstract
The "e-rater"® automated essay scoring system is used operationally in the scoring of the argument and issue tasks that form the Analytical Writing measure of the "GRE"® General Test. For each of these tasks, this study explored the value added of reporting 4 trait scores for each of these 2 tasks over the total e-rater score. The 4 trait scores are word choice, grammatical conventions, fluency and organization, and content. First, confirmatory factor analysis supported this underlying structure. Next, several alternative ways of determining feature weights for trait scores were compared: weights based on regression parameters of the trait features on human scores, reliability of trait features, and loadings of features from factor analytic results. In addition, augmented trait scores, based on information from other trait scores, were also analyzed. The added value of all trait score variants was evaluated by comparing the ability to predict a particular trait score on one task from either the same trait score on the other task or the e-rater score on the other task. Results supported the use of trait scores and are discussed in terms of their contribution to the construct validity of e-rater as an alternative essay scoring method.
- Published
- 2015
7. Fit of Item Response Theory Models: A Survey of Data from Several Operational Tests. Research Report. ETS RR-11-29
- Author
-
Educational Testing Service, Sinharay, Sandip, Haberman, Shelby J., and Jia, Helena
- Abstract
Standard 3.9 of the "Standards for Educational and Psychological Testing" (American Educational Research Association, American Psychological Association, & National Council for Measurement in Education, 1999) demands evidence of model fit when an item response theory (IRT) model is used to make inferences from a data set. We applied two recently suggested methods for assessing goodness of fit of IRT models--generalized residual analysis (Haberman, 2009) and residual analysis for assessing item fit (Bock & Haberman, 2009)--to several operational data sets. We assessed the practical significance of misfit whenever possible. This report summarizes our findings. Though evidence of misfit of the IRT model was found for all the data sets, the misfit was not always practically significant. (Contains 3 tables, 50 figures and 6 notes.)
- Published
- 2011
8. Assessing Fit of Latent Regression Models. Research Report. ETS RR-09-50
- Author
-
Sinharay, Sandip, Guo, Zhumei, von Davier, Matthias, and Veldkamp, Bernard P.
- Abstract
The reporting methods used in large-scale educational assessments such as the National Assessment of Educational Progress (NAEP) rely on a "latent regression model". There is a lack of research on the assessment of fit of latent regression models. This paper suggests a simulation-based model-fit technique to assess the fit of such models. The technique consists of investigating whether basic statistical summaries are predicted adequately by the latent regression model. Application of the suggested technique to an operational NAEP data set reveals important information regarding the fit of the latent regression model to the data.
- Published
- 2009
9. The Correlation between Item Parameters and Item Fit Statistics. Research Report. ETS RR-07-36
- Author
-
Sinharay, Sandip and Lu, Ying
- Abstract
Dodeen (2004) studied the correlation between the item parameters of the three-parameter logistic model and two item fit statistics, and found some linear relationships (e.g., a positive correlation between item discrimination parameters and item fit statistics) that have the potential for influencing the work of practitioners who employ item response theory. This paper examines the same type of linear relationships as studied in Dodeen. However, this paper adds to the literature by employing item fit statistics not considered in Dodeen, which have been recently suggested and whose Type I error rates have been demonstrated to be generally close to the nominal level. Detailed simulations show that if one uses certain of the recently suggested item fit statistics, there is no need to worry about any linear relationships between the item parameters and item fit statistics.
- Published
- 2007
10. Testing the Untestable Assumptions of the Chain and Poststratification Equating Methods for the NEAT Design. Research Report. ETS RR-06-17
- Author
-
Holland, Paul W., von Davier, Alina A., Sinharay, Sandip, and Han, Ning
- Abstract
This paper focuses on the Non-Equivalent Groups with Anchor Test (NEAT) design for test equating and on two classes of observed--score equating (OSE) methods--chain equating (CE) and poststratification equating (PSE). These two classes of methods reflect two distinctly different ways of using the information provided by the anchor test for computing OSE functions. Each of the two classes includes linear and nonlinear equating methods. In practical situations, it is known that the PSE and CE methods tend to give different results when the two groups of examinees differ in ability. However, given that both methods are justified by making untestable assumptions, it is difficult to conclude which, if either, of the two equating approaches is more correct. This study compares predictions from both the PSE and the CE assumptions that can be tested in a comparable way with the data from a special study. Results indicate that both CE and PSE make very similar predictions but that those of CE are slightly more accurate than those of PSE.
- Published
- 2006
11. Limits on Log Cross-Product Ratios for Item Response Models. Research Report. ETS RR-06-10
- Author
-
Haberman, Shelby J., Holland, Paul W., and Sinharay, Sandip
- Abstract
Bounds are established for log cross-product ratios (log odds ratios) involving pairs of items for item response models. First, expressions for bounds on log cross-product ratios are provided for unidimensional item response models in general. Then, explicit bounds are obtained for the Rasch model and the two-parameter logistic (2PL) model. Results are also illustrated through an example from a study of model-checking procedures. The bounds obtained can provide a basis for assessment of goodness of fit of these models.
- Published
- 2006
12. Model Diagnostics for Bayesian Networks. Research Report. ETS RR-04-17
- Author
-
Sinharay, Sandip
- Abstract
Assessing fit of psychometric models has always been an issue of enormous interest, but there exists no unanimously agreed upon item fit diagnostic for the models. Bayesian networks, frequently used in educational assessments (see, for example, Mislevy, Almond, Yan, & Steinberg, 2001) primarily for learning about students' knowledge and skills, are no exception. This paper employs the "posterior predictive model checking method" (Guttman, 1967; Rubin, 1984), a popular Bayesian model checking tool, to assess fit of simple Bayesian networks. A number of aspects of model fit, those of usual interest to practitioners, are assessed in this paper using various diagnostic tools. The first diagnostic used is direct data display--a visual comparison of the observed data set and a number of the posterior predictive data sets (that are predicted by the model). The second aspect examined here is item fit. Examinees are grouped into a number of equivalence classes, based on the generated values of their skill variables, and the observed and expected proportion correct scores on an item for the classes are combined to provide a ?[superscript 2]-type and a G[superscript 2]-type test statistic for each item. Another (similar) set of ?[superscript 2]-type and G[superscript 2]-type test statistic is obtained by grouping the examinees by their raw scores and then comparing their observed and expected proportion correct scores on an item. This paper also suggests how to obtain posterior predictive p-values, natural candidate p-values from a Bayesian viewpoint, for the ?[superscript 2]-type and G[superscript 2]-type test statistics. The paper further examines the association among the items, especially if the model can explain the odds ratios corresponding to the responses to pairs of items. Finally, in an effort to examine the issue of differential item functioning (DIF), this paper suggests a version of the Mantel-Haenszel statistic (Holland, 1985), which uses "matched groups" based on equivalence classes, as a discrepancy measure with posterior predictive model checking. Limited simulation studies and a real data application examine the effectiveness of the suggested model diagnostics.
- Published
- 2004
13. Assessing Fit of Models with Discrete Proficiency Variable in Educational Assessment. Research Report. RR-04-07
- Author
-
Educational Testing Service, Princeton, NJ., Sinharay, Sandip, Almond, Russell, and Yan, Duanli
- Abstract
Model checking is a crucial part of any statistical analysis. As educators tie models for testing to cognitive theory of the domains, there is a natural tendency to represent participant proficiencies with latent variables representing the presence or absence of the knowledge, skills, and proficiencies to be tested (Mislevy, Almond, Yan, & Steinberg, 2001). Model checking for these models is not straightforward, mainly because traditional X[superscript 2]-type tests do not apply except for assessments with a small number of items. Williamson, Mislevy, and Almond (2000) note a lack of published diagnostic tools for these models. This paper suggests a number of graphics and statistics for diagnosing problems with models with discrete proficiency variables. A small diagnostic assessment first analyzed by Tatsuoka (1990) serves as a test bed for these tools. This work is a continuation of the recent work by Yan, Mislevy, and Almond (2003) on this data set. Two diagnostic tools that prove useful are Bayesian residual plots and an analog of the item characteristic curve (ICC) plots. An X[superscript 2]-type statistic based on the latter plot shows some promise, but more work is required to establish the null distribution of the statistic. On the basis of the identified problems with the model used by Mislevy (1995), the suggested diagnostics are helpful to hypothesize an improved model that seems to fit better. (Contains 5 tables and 12 figures.)
- Published
- 2004
14. A New Person-Fit Statistic for the Lognormal Model for Response Times
- Author
-
Sinharay, Sandip
- Abstract
Response-time models are of increasing interest in educational and psychological testing. This article focuses on the lognormal model for response times, which is one of the most popular response-time models, and suggests a simple person-fit statistic for the model. The distribution of the statistic under the null hypothesis of no misfit is proved to be a X[superscript 2] distribution. A simulation study and a real data example demonstrate the usefulness of the suggested statistic.
- Published
- 2018
- Full Text
- View/download PDF
15. How to Compare Parametric and Nonparametric Person-Fit Statistics Using Real Data
- Author
-
Sinharay, Sandip
- Abstract
Person-fit assessment (PFA) is concerned with uncovering atypical test performance as reflected in the pattern of scores on individual items on a test. Existing person-fit statistics (PFSs) include both parametric and nonparametric statistics. Comparison of PFSs has been a popular research topic in PFA, but almost all comparisons have employed simulated data. This article suggests an approach for comparing the performance of parametric and nonparametric PFSs using real data. This article then shows that there is no clear winner between l[subscript z]*, a popular parametric PFS, and H[superscript T], a popular nonparametric statistic, in a comparison using the suggested approach. This finding is contradictory to the common finding shown by Karabatsos, Dimitrov and Smith, and Tendeiro and Meijer that H[superscript T] is more powerful than several parametric PFSs including l[subscript z]* and l[subscript z].
- Published
- 2017
- Full Text
- View/download PDF
16. Are the Nonparametric Person-Fit Statistics More Powerful than Their Parametric Counterparts? Revisiting the Simulations in Karabatsos (2003)
- Author
-
Sinharay, Sandip
- Abstract
Karabatsos compared the power of 36 person-fit statistics using receiver operating characteristics curves and found the "H[superscript T]" statistic to be the most powerful in identifying aberrant examinees. He found three statistics, "C", "MCI", and "U3", to be the next most powerful. These four statistics, all of which are nonparametric, were found to perform considerably better than each of 25 parametric person-fit statistics. Dimitrov and Smith replicated part of this finding in a similar study. The present article raises some issues with the comparisons performed in Karabatsos and Dimitrov and Smith and points to literature that suggests that the comparisons could have been performed in a more traditional and more fair manner. The present article then replicates the simulations of Karabatsos and demonstrates in several ways that the parametric person-fit statistics l[subscript z] and "ECI4[subscript z]" (that were also considered by Karabatsos) are as powerful as are "H[superscript T]" and "U3" in identifying aberrant examinees in more traditional and fair comparisons. Two parametric person-fit statistics are shown to lead to similar results as "H[superscript T]" and "U3" in a real data example.
- Published
- 2017
- Full Text
- View/download PDF
17. Person Fit Analysis in Computerized Adaptive Testing Using Tests for a Change Point
- Author
-
Sinharay, Sandip
- Abstract
Meijer and van Krimpen-Stoop noted that the number of person-fit statistics (PFSs) that have been designed for computerized adaptive tests (CATs) is relatively modest. This article partially addresses that concern by suggesting three new PFSs for CATs. The statistics are based on tests for a change point and can be used to detect an abrupt change in test performance of examinees during a CAT. The Type I error rate and power of the statistics are computed from a detailed simulation study. The performances of the new statistics are compared with those of four existing PFSs using receiver operating characteristics curves. The new statistics are then computed using data from an operational and high-stakes CAT. The new PFSs appear promising for assessment of person fit for CATs.
- Published
- 2016
- Full Text
- View/download PDF
18. Assessment of Fit of Item Response Theory Models Used in Large-Scale Educational Survey Assessments
- Author
-
van Rijn, Peter W., Sinharay, Sandip, Haberman, Shelby J., and Johnson, Matthew S.
- Abstract
Latent regression models are used for score-reporting purposes in large-scale educational survey assessments such as the National Assessment of Educational Progress (NAEP) and Trends in International Mathematics and Science Study (TIMSS). One component of these models is based on item response theory. While there exists some research on assessment of fit of item response theory models in the context of large-scale assessments, there is a scope of further research on the topic. We suggest two types of residuals to assess the fit of item response theory models in the context of large-scale assessments. The Type I error rates and power of the residuals are computed from simulated data. The residuals are computed using data from four NAEP assessments. Misfit was found for all data sets for both types of residuals, but the practical significance of the misfit was minimal.
- Published
- 2016
- Full Text
- View/download PDF
19. Assessment of Person Fit for Mixed-Format Tests
- Author
-
Sinharay, Sandip
- Abstract
Person-fit assessment may help the researcher to obtain additional information regarding the answering behavior of persons. Although several researchers examined person fit, there is a lack of research on person-fit assessment for mixed-format tests. In this article, the lz statistic and the ?2 statistic, both of which have been used for tests with only dichotomous items or with only polytomous items, were modified for use with mixed-format tests. In a detailed simulation, the lz and ?2 statistics are found to be conservative under a (frequentist) asymptotic normal approximation. However, the use of the statistics along with the (Bayesian) posterior predictive model checking method leads to a larger power. The suggested approaches are applied to an operational data set. The approaches appear to be satisfactory tools for assessing person fit for mixed-format tests.
- Published
- 2015
- Full Text
- View/download PDF
20. How Often Is the Misfit of Item Response Theory Models Practically Significant?
- Author
-
Sinharay, Sandip and Haberman, Shelby J.
- Abstract
Standard 3.9 of the Standards for Educational and Psychological Testing ([, 1999]) demands evidence of model fit when item response theory (IRT) models are employed to data from tests. Hambleton and Han ([Hambleton, R. K., 2005]) and Sinharay ([Sinharay, S., 2005]) recommended the assessment of practical significance of misfit of IRT models, but few examples of such assessment can be found in the literature concerning IRT model fit. In this article, practical significance of misfit of IRT models was assessed using data from several tests that employ IRT models to report scores. The IRT model did not fit any data set considered in this article. However, the extent of practical significance of misfit varied over the data sets.
- Published
- 2014
- Full Text
- View/download PDF
21. An Investigation of the Fit of Linear Regression Models to Data from an SAT[R] Validity Study. Research Report 2011-3
- Author
-
College Board, Kobrin, Jennifer L., Sinharay, Sandip, Haberman, Shelby J., and Chajewski, Michael
- Abstract
This study examined the adequacy of a multiple linear regression model for predicting first-year college grade point average (FYGPA) using SAT[R] scores and high school grade point average (HSGPA). A variety of techniques, both graphical and statistical, were used to examine if it is possible to improve on the linear regression model. The results suggest that the linear regression model mostly provides an adequate fit to the data and that more complicated models do not significantly improve the prediction of FYGPA from SAT scores and HSGPA. (Contains 7 tables and 10 figures.)
- Published
- 2011
22. Reporting of Subscores Using Multidimensional Item Response Theory
- Author
-
Haberman, Shelby J. and Sinharay, Sandip
- Abstract
Recently, there has been increasing interest in reporting subscores. This paper examines reporting of subscores using multidimensional item response theory (MIRT) models (e.g., Reckase in "Appl. Psychol. Meas." 21:25-36, 1997; C.R. Rao and S. Sinharay (Eds), "Handbook of Statistics, vol. 26," pp. 607-642, North-Holland, Amsterdam, 2007; Beguin & Glas in "Psychometrika," 66:471-488, 2001). A MIRT model is fitted using a stabilized Newton-Raphson algorithm (Haberman in "The Analysis of Frequency Data," University of Chicago Press, Chicago, 1974; "Sociol. Methodol." 18:193-211, 1988) with adaptive Gauss-Hermite quadrature (Haberman, von Davier, & Lee in "ETS Research Rep. No. RR-08-45," ETS, Princeton, 2008). A new statistical approach is proposed to assess when subscores using the MIRT model have any added value over (i) the total score or (ii) subscores based on classical test theory (Haberman in "J. Educ. Behav. Stat." 33:204-229, 2008; Haberman, Sinharay, & Puhan in "Br. J. Math. Stat. Psychol." 62:79-95, 2008). The MIRT-based methods are applied to several operational data sets. The results show that the subscores based on MIRT are slightly more accurate than subscore estimates derived by classical test theory. (Contains 5 tables, 4 figures, and 2 footnotes.)
- Published
- 2010
- Full Text
- View/download PDF
23. Limits on Log Odds Ratios for Unidimensional Item Response Theory Models
- Author
-
Haberman, Shelby J., Holland, Paul W., and Sinharay, Sandip
- Abstract
Bounds are established for log odds ratios (log cross-product ratios) involving pairs of items for item response models. First, expressions for bounds on log odds ratios are provided for one-dimensional item response models in general. Then, explicit bounds are obtained for the Rasch model and the two-parameter logistic (2PL) model. Results are also illustrated through an example from a study of model-checking procedures. The bounds obtained can provide an elementary basis for assessment of goodness of fit of these models.
- Published
- 2007
- Full Text
- View/download PDF
24. Posterior Predictive Assessment of Item Response Theory Models
- Author
-
Sinharay, Sandip, Johnson, Matthew S., and Stern, Hal S.
- Abstract
Model checking in item response theory (IRT) is an underdeveloped area. There is no universally accepted tool for checking IRT models. The posterior predictive model-checking method is a popular Bayesian model-checking tool because it has intuitive appeal, is simple to apply, has a strong theoretical basis, and can provide graphical or numerical evidence about model misfit. An important issue with the application of the posterior predictive model-checking method is the choice of a discrepancy measure (which plays a role like that of a test statistic in traditional hypothesis tests). This article examines the performance of a number of discrepancy measures for assessing different aspects of fit of the common IRT models and makes specific recommendations about what measures are most useful in assessing model fit. Graphical summaries of model-checking results are demonstrated to provide useful insights about model fit. (Contains 13 figures, 3 tables, and 3 notes.)
- Published
- 2006
- Full Text
- View/download PDF
25. Model Diagnostics for Bayesian Networks
- Author
-
Sinharay, Sandip
- Abstract
Bayesian networks are frequently used in educational assessments primarily for learning about students' knowledge and skills. There is a lack of works on assessing fit of Bayesian networks. This article employs the posterior predictive model checking method, a popular Bayesian model checking tool, to assess fit of simple Bayesian networks. A number of aspects of model fit, those of usual interest to practitioners, are assessed using various diagnostic tools. This article suggests a direct data display for assessing overall fit, suggests several diagnostics for assessing item fit, suggests a graphical approach to examine if the model can explain the association among the items, and suggests a version of the Mantel-Haenszel statistic for assessing differential item functioning. Limited simulation studies and a real data application demonstrate the effectiveness of the suggested model diagnostics. (Contains 4 tables and 9 figures.)
- Published
- 2006
26. Assessing Fit of Unidimensional Item Response Theory Models Using a Bayesian Approach
- Author
-
Sinharay, Sandip
- Abstract
Even though Bayesian estimation has recently become quite popular in item response theory (IRT), there is a lack of works on model checking from a Bayesian perspective. This paper applies the posterior predictive model checking (PPMC) method (Guttman, 1967; Rubin, 1984), a popular Bayesian model checking tool, to a number of real applications of unidimensional IRT models. The applications demonstrate how to exploit the flexibility of the posterior predictive checks to meet the need of the researcher. This paper also examines practical consequences of misfit, an area often ignored in educational measurement literature while assessing model fit.
- Published
- 2005
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.