The Reproducibility of Statistical Results in Psychological Research: An Investigation Using Unpublished Raw Data

PSYCHOLOGICAL METHODS(2021)

Cited 33|Views0
No score
Abstract
We investigated the reproducibility of the major statistical conclusions drawn in 46 articles published in 2012 in three APA journals. After having identified 232 key statistical claims, we tried to reproduce, for each claim, the test statistic, its degrees of freedom, and the corresponding p value, starting from the raw data that were provided by the authors and closely following the Method section in the article. Out of the 232 claims, we were able to successfully reproduce 163 (70%), 18 of which only by deviating from the article's analytical description. Thirteen (7%) of the 185 claims deemed significant by the authors are no longer so. The reproduction successes were often the result of cumbersome and time-consuming trial-and-error work, suggesting that APA style reporting in conjunction with raw data makes numerical verification at least hard, if not impossible. This article discusses the types of mistakes we could identify and the tediousness of our reproduction efforts in the light of a newly developed taxonomy for reproducibility. We then link our findings with other findings of empirical research on this topic, give practical recommendations on how to achieve reproducibility, and discuss the challenges of large-scale reproducibility checks as well as promising ideas that could considerably increase the reproducibility of psychological research. Translational Abstract Reproducible findings, that are findings that can be verified by an independent researcher using the same data and repeating the exact same calculations, are a pillar of empirical scientific research. We investigated the reproducibility of the major statistical conclusions drawn in 46 scientific articles from 2012. After having identified over 200 key statistical conclusions drawn in those articles, we tried to reproduce, for each conclusion, the underlying statistical results starting from the raw data that were provided by the authors and closely following the descriptions of the article. We were unable to successfully reproduce the underlying statistical results for almost one third of the identified conclusions. Moreover, around 5% of these conclusions do no longer hold. Successfully reproduced conclusions were often the result of cumbersome and time-consuming trial-and-error work, suggesting that the prevailing reporting style in psychology makes verification of statistical results through an independent reanalysis at least hard, if not impossible. This work discusses the types of mistakes we could identify and the tediousness of our reproduction efforts in the light of a newly developed taxonomy for reproducibility. We then link our findings with other findings of empirical research on this topic, give practical recommendations on how to achieve reproducibility, and discuss the challenges of large-scale reproducibility checks as well as promising ideas that could considerably increase the reproducibility of psychological research.
More
Translated text
Key words
reanalysis, reproducible research, reporting errors, p values, transparency
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined