Back to Search Start Over

How Many Replicators Does It Take to Achieve Reliability? Investigating Researcher Variability in a Crowdsourced Replication

Authors :
Nate Breznau
Eike Mark Rinke
Alexander Wuttke
Hung Hoang Viet Nguyen
Muna Adem
Jule Adriaans
Esra Akdeniz
Amalia Alvarez-Benjumea
Henrik Kenneth Andersen
Daniel Auer
Flavio Azevedo
Oke Bahnsen
Ling Bai
Dave Balzer
Gerrit Bauer
Paul Bauer
Markus Baumann
Sharon Baute
Verena Benoit
Julian Bernauer
Carl Berning
Anna Berthold
Felix S. Bethke
Thomas Biegert
Katharina Blinzler
Johannes Blumenberg
Licia Bobzien
Andrea Bohman
Thijs Bol
Amie Bostic
Zuzanna Brzozowska
Katharina Burgdorf
Kaspar Burger
Kathrin Busch
Juan Carlos Castillo
Nathan Chan
Pablo Christmann
Roxanne Connelly
Christian S. Czymara
Elena Damian
Eline Adriane de Rooij
Alejandro Ecker
Achim Edelmann
Christina Eder
Maureen A. Eger
Simon Ellerbrock
Anna Forke
Andrea Gabriele Forster
Danilo Freire
Chris Gaasendam
Konstantin Gavras
Vernon Gayle
Theresa Gessler
Timo Gnambs
Amélie Godefroidt
Max Grömping
Martin Groß
Stefan Gruber
Tobias Gummer
Andreas Hadjar
Verena Halbherr
Jan Paul Heisig
Sebastian Hellmeier
Stefanie Heyne
Magdalena Hirsch
Mikael Hjerm
Oshrat Hochman
Jan H. Höffler
Andreas Hövermann
Sophia Hunger
Christian Hunkler
Nora Huth
Zsofia Ignacz
Sabine Israel
Laura Jacobs
Jannes Jacobsen
Bastian Jaeger
Sebastian Jungkunz
Nils Jungmann
Jennifer Kanjana
Mathias Kauff
Salman Khan
Sayak Khatua
Manuel Kleinert
Julia Klinger
Jan-Philipp Kolb
Marta Kolczynska
John Seungmin Kuk
Katharina Kunißen
Dafina Kurti Sinatra
Alexander Greinert
Robin C. Lee
Philipp M. Lersch
David Liu
Lea-Maria Löbel
Philipp Lutscher
Matthias Mader
Joan Eliel Madia
Natalia Malancu
Luis Maldonado
Helge Marahrens
Nicole Martin
Paul Martinez
Jochen Mayerl
OSCAR Jose MAYORGA
Robert Myles McDonnell
Patricia A. McManus
Kyle Wagner
Cecil Meeusen
Daniel Meierrieks
Jonathan Mellon
Friedolin Merhout
Samuel Merk
Daniel Meyer
Leticia Micheli
Jonathan J.B. Mijs
Cristóbal Moya
Marcel Neunhoeffer
Daniel Nüst
Olav Nygård
Fabian Ochsenfeld
Gunnar Otte
Anna Pechenkina
Mark Pickup
Christopher Prosser
Louis Raes
Kevin Ralston
Miguel Ramos
Frank Reichert
Arne Roets
Jonathan Rogers
Guido Ropers
Robin Samuel
Gergor Sand
Constanza Sanhueza Petrarca
Ariela Schachter
Merlin Schaeffer
David Schieferdecker
Elmar Schlueter
Katja Schmidt
Regine Schmidt
Alexander Schmidt-Catran
Claudia Schmiedeberg
Jürgen Schneider
Martijn Schoonvelde
Julia Schulte-Cloos
Sandy Schumann
Reinhard Schunck
Juergen Schupp
Julian Seuring
Henning Silber
Willem W. A. Sleegers
Nico Sonntag
Alexander Staudt
Nadia Steiber
Nils Steiner
Sebastian Sternberg
Dieter Stiers
Dragana Stojmenovska
Nora Storz
Erich Striessnig
Anne-Kathrin Stroppe
Jordan Suchow
Janna Teltemann
Andrey Tibajev
Brian B. Tung
Giacomo Vagni
Jasper Van Assche
Meta van der Linden
Jolanda van der Noll
Arno Van Hootegem
Stefan Vogtenhuber
Bogdan Voicu
Fieke Wagemans
Nadja Wehl
Hannah Werner
Brenton M. Wiernik
Fabian Winter
Christof Wolf
Cary Wu
Yuki Yamada
Björn Zakula
Nan Zhang
Conrad Ziller
Stefan Zins
Tomasz Żółtak
Publication Year :
2021

Abstract

This paper reports findings from a crowdsourced replication. Eighty-five independent teams attempted a computational replication of results reported in an original study of policy preferences and immigration by fitting the same statistical models to the same data. The replication involved an experimental condition. Random assignment put participating teams into either the transparent group that received the original study and code, or the opaque group receiving only a methods section, rough results description and no code. The transparent group mostly verified the numerical results of the original study with the same sign and p-value threshold (95.7%), while the opaque group had less success (89.3%). Exact numerical reproductions to the second decimal place were far less common (76.9% and 48.1%), and the number of teams who verified at least 95% of all effects in all models they ran was 79.5% and 65.2% respectively. Therefore, the reliability we quantify depends on how reliability is defined, but most definitions suggest it would take a minimum of three independent replications to achieve reliability. Qualitative investigation of the teams’ workflows reveals many causes of error including mistakes and procedural variations. Although minor error across researchers is not surprising, we show this occurs where it is least expected in the case of computational reproduction. Even when we curate the results to boost ecological validity, the error remains large enough to undermine reliability between researchers to some extent. The presence of inter-researcher variability may explain some of the current “reliability crisis” in the social sciences because it may be undetected in all forms of research involving data analysis. The obvious implication of our study is more transparency. Broader implications are that researcher variability adds an additional meta-source of error that may not derive from conscious measurement or modeling decisions, and that replications cannot alone resolve this type of uncertainty.

Details

Language :
English
Database :
OpenAIRE
Accession number :
edsair.doi.dedup.....687f28e57fcc17aca79491b1e691e191