1. Predicting Question Difficulty in Web Surveys: A Machine Learning Approach Based on Mouse Movement Features
- Author
-
Felix Henninger, Pascal J. Kieslich, Sonja Greven, Frauke Kreuter, and Amanda Fernández-Fontelo
- Subjects
difficulty ,Computer science ,050801 communication & media studies ,Library and Information Sciences ,mouse movements ,web surveys ,Paradata ,Personalization ,0508 media and communications ,050602 political science & public administration ,personalization ,Movement (music) ,300 Sozialwissenschaften ,05 social sciences ,supervised learning models ,General Social Sciences ,Survey research ,paradata ,004 Informatik ,Data science ,0506 political science ,Computer Science Applications ,classification ,ddc:300 ,Imperfect ,ddc:004 ,Law - Abstract
Survey research aims to collect robust and reliable data from respondents. However, despite researchers’ efforts in designing questionnaires, survey instruments may be imperfect, and question structure not as clear as could be, thus creating a burden for respondents. If it were possible to detect such problems, this knowledge could be used to predict problems in a questionnaire during pretesting, inform real-time interventions through responsive questionnaire design, or to indicate and correct measurement error after the fact. Previous research has used paradata, specifically response times, to detect difficulties and help improve user experience and data quality. Today, richer data sources are available, for example, movements respondents make with their mouse, as an additional detailed indicator for the respondent–survey interaction. This article uses machine learning techniques to explore the predictive value of mouse-tracking data regarding a question’s difficulty. We use data from a survey on respondents’ employment history and demographic information, in which we experimentally manipulate the difficulty of several questions. Using measures derived from mouse movements, we predict whether respondents have answered the easy or difficult version of a question, using and comparing several state-of-the-art supervised learning methods. We have also developed a personalization method that adjusts for respondents’ baseline mouse behavior and evaluate its performance. For all three manipulated survey questions, we find that including the full set of mouse movement measures and accounting for individual differences in these measures improve prediction performance over response-time-only models. German Research Foundation (DFG)
- Published
- 2023