1. StatBreak: Identifying 'lucky' data points through genetic algorithms
- Author
-
Marcel Zeelenberg, Anthony M. Evans, Leon P. Hilbert, Hannes Rosenbusch, Marketing, and Department of Social Psychology
- Subjects
metapsychology ,Computer science ,Small number ,05 social sciences ,Robust statistics ,050401 social sciences methods ,Sample (statistics) ,outlier detection ,computer.software_genre ,open materials ,050105 experimental psychology ,Data point ,0504 sociology ,robust statistics ,replicability ,0501 psychology and cognitive sciences ,Anomaly detection ,Data mining ,computer ,General Psychology - Abstract
Sometimes interesting statistical findings are produced by a small number of “lucky” data points within the tested sample. To address this issue, researchers and reviewers are encouraged to investigate outliers and influential data points. Here, we present StatBreak, an easy-to-apply method, based on a genetic algorithm, that identifies the observations that most strongly contributed to a finding (e.g., effect size, model fit, p value, Bayes factor). Within a given sample, StatBreak searches for the largest subsample in which a previously observed pattern is not present or is reduced below a specifiable threshold. Thus, it answers the following question: “Which (and how few) ‘lucky’ cases would need to be excluded from the sample for the data-based conclusion to change?” StatBreak consists of a simple R function and flags the luckiest data points for any form of statistical analysis. Here, we demonstrate the effectiveness of the method with simulated and real data across a range of study designs and analyses. Additionally, we describe StatBreak’s R function and explain how researchers and reviewers can apply the method to the data they are working with.
- Published
- 2020
- Full Text
- View/download PDF