1. Bots or inattentive humans? Identifying sources of low-quality data in online platforms
- Author
-
Cheskie Rosenzweig, Gautam R, Jonathan Robinson, Jaffe Sn, Litman L, and Aaron J. Moss
- Subjects
Text mining ,business.industry ,Computer science ,Data quality ,business ,Data science - Abstract
Online data collection has become indispensable to the social sciences, polling, marketing, and corporate research. However, in recent years, online data collection has been inundated with low quality data. Low quality data threatens the validity of online research and, at times, invalidates entire studies. It is often assumed that random, inconsistent, and fraudulent data in online surveys comes from ‘bots.’ But little is known about whether bad data is caused by bots or ill-intentioned or inattentive humans. We examined this issue on Mechanical Turk (MTurk), a popular online data collection platform. In the summer of 2018, researchers noticed a sharp increase in the number of data quality problems on MTurk, problems that were commonly attributed to bots. Despite this assumption, few studies have directly examined whether problematic data on MTurk are from bots or inattentive humans, even though identifying the source of bad data has important implications for creating the right solutions. Using CloudResearch’s data quality tools to identify problematic participants in 2018 and 2020, we provide evidence that much of the data quality problems on MTurk can be tied to fraudulent users from outside of the U.S. who pose as American workers. Hence, our evidence strongly suggests that the source of low quality data is real humans, not bots. We additionally present evidence that these fraudulent users are behind data quality problems on other platforms.
- Published
- 2021
- Full Text
- View/download PDF