1. Curious Containers: A framework for computational reproducibility in life sciences with support for Deep Learning applications
- Author
-
Peter Hufnagl, Felix Bartusch, Jonas Annuscheit, Michael Witt, Bruno Schilling, Christian Herta, Dagmar Krefting, Klaus Strohmenger, and Christoph Jansen
- Subjects
Decision support system ,Reproducibility ,Computer Networks and Communications ,business.industry ,Computer science ,Deep learning ,Interoperability ,020206 networking & telecommunications ,02 engineering and technology ,computer.software_genre ,Software framework ,Software ,Workflow ,Hardware and Architecture ,Container (abstract data type) ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Artificial intelligence ,Reference implementation ,business ,Software engineering ,computer - Abstract
In clinical scenarios, there is an increasing interest in complex computational experiments, as for example the training of Deep Learning models. Reproducibility is an essential property of such experiments, especially if the result contributes to a patient’s treatment. This paper introduces Curious Containers, a software framework for computational reproducibility that treats data, software and runtime environment as decentralized network resources. All experiment resources are described in a single file, using a new format that is compatible with a subset of the Common Workflow Language. Docker is used to deploy the experiment software in a container image, including arbitrary data transmission programs to connect with existing storage solutions. The framework supports Deep Learning applications, that have a high demand in storage and processing capabilities. Large datasets can be mounted inside containers via network filesystems like SSHFS based on the filesystem in user-space technology. The Nvidia-Container-Toolkit enables GPU usage. Curious Containers has been tested in two biomedical scenarios. The first use case is a Deep Learning application for tumor classification in images that requires a large dataset and a GPU. In this context, a prototypical integration of the framework with the existing Data Version Control system for exploratory Deep Learning modeling has been developed. The second use case extends an existing container image, including a scientific workflow for detection and comparison of human protein in mass spectrography data. The container image was originally developed for an archiving platform and could be extended to be compatible with both Curious Containers and cwltool, the Common Workflow Language reference implementation. The presented solution allows for consistent description and execution of computational experiments, while trying to be both flexible and interoperable with existing software and standards. Support for Deep Learning experiments is gaining importance as such systems are increasingly validated as medical decision support systems.
- Published
- 2020