Containerized Analyses Enable Interactive and Reproducible Statistics

arXiv (Cornell University)(2021)

Cited 0|Views0
No score
Abstract
In recent decades the analysis of data has become increasingly computational. Correspondingly, this has changed how scientific and statistical work is shared. For example, it is now commonplace for underlying analysis code and data to be proffered alongside journal publications and conference talks. Unfortunately, sharing code faces several challenges. First, it is often difficult to take code from one computer and run it on another. Code configuration, version, and dependency issues often make this challenging. Secondly, even if the code runs, it is often hard to understand or interact with the analysis. This makes it difficult to assess the code and its findings, for example, in a peer review process. In this paper we advocate for two practical approaches to help make sharing interactive and reproducible analyses easy: (1) analysis containerization, a technology that fully encapsulates an analysis, data, code and dependencies into a shareable format, and (2) code notebooks, an accessible format for interacting with third-party analyses. We will demonstrate that the combination of these two technologies is powerful and that containerizing interactive code notebooks can help make it easy for statisticians to share code, analyses, and ideas.
More
Translated text
Key words
analyses,statistics
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined