Back to Search Start Over

Toward Long-Term and Archivable Reproducibility

Authors :
Akhlaghi, Mohammad
Infante-Sainz, Raúl
Roukema, Boudewijn F.
Khellat, Mohammadreza
Valls-Gabaud, David
Baena-Gallé, Roberto
Source :
Computing in Science & Engineering (2021), vol. 23, issue 3, pp. 82-91
Publication Year :
2020

Abstract

Analysis pipelines commonly use high-level technologies that are popular when created, but are unlikely to be readable, executable, or sustainable in the long term. A set of criteria is introduced to address this problem: Completeness (no execution requirement beyond a minimal Unix-like operating system, no administrator privileges, no network connection, and storage primarily in plain text); modular design; minimal complexity; scalability; verifiable inputs and outputs; version control; linking analysis with narrative; and free and open source software. As a proof of concept, we introduce "Maneage" (Managing data lineage), enabling cheap archiving, provenance extraction, and peer verification that has been tested in several research publications. We show that longevity is a realistic requirement that does not sacrifice immediate or short-term reproducibility. The caveats (with proposed solutions) are then discussed and we conclude with the benefits for the various stakeholders. This article is itself a Maneage'd project (project commit 54e4eb2).<br />Comment: Published version. The downloadable source (on arXiv) includes the full/automatic reproduction resources (scripts, config files and input data links). Git repository: https://git.maneage.org/paper-concept.git (also on Software Heritage), Zenodo: https://doi.org/10.5281/zenodo.3872247

Details

Database :
arXiv
Journal :
Computing in Science & Engineering (2021), vol. 23, issue 3, pp. 82-91
Publication Type :
Report
Accession number :
edsarx.2006.03018
Document Type :
Working Paper
Full Text :
https://doi.org/10.1109/MCSE.2021.3072860