1. Towards Reproducible Research on CyberGISX with Lmod and Easybuild
- Author
-
Michels, Alexander, Padmanabhan, Anand, Li, Zhiyu, and Wang, Shaowen
- Subjects
Geospatial Software ,cyberGIS ,Easybuild ,Jupyter - Abstract
JupyterHub [1] has become a popular choice in many scientific communities, offering an easy-to-use interface for users with little to no frontend development work while promoting reproducible and replicable (R&R) science [2]. In the broad geospatial science community, CyberGISX [3] provides such a gateway environment with many cyberGIS (i.e., geospatial information science and systems based on advanced cyberinfrastructure) and geospatial software packages prebuilt and ready to use. Like other JupyterHub-based solutions, CyberGISX also provides container-based access for its users and must balance a trade-off between providing a static compute environment which enhances R&R and continuously updating the software environment to keep up with advances in scientific software. Solutions such as Binder [4] have attempted to address this trade-off by having required dependencies encoded in the package and building the software environment at the time of use. However, such a solution comes with two major disadvantages: (a) software is built at the time it is needed, increasing startup time and introducing the possibility that some of the dependencies of the environment are no longer available or have changed; and (b) the onus of specifying and managing software installations is passed to notebook developers, many of whom are domain scientists and not comfortable with such responsibilities. To address these challenges and enhance R&R with minimal effort from end-users, we have designed and implemented a solution on CyberGISX that allows software to be kept on an external file server mounted into each user's environment. Scientific software is installed with Easybuild [5] and managed by Lmod [6] giving a variety of benefits: (1) the compute environment is more standardized and easily reproducible outside of the gateway; (2) multiple versions of software can be made available to users without increasing container size; and (3) the exact copies of software are always available on the gateway instead of being rebuilt for every release, further enhancing R&R. We also employ an Easybuild-installed Anaconda [7] to create and manage conda environments on the file server. The combination of the software stack from Easybuild and Python environment from conda provides end-users with kernels for their Jupyter notebooks which are persistent and unchanged as the gateway's container updates. This design enhances R&R and adds functionality for advanced users without introducing technical barriers to non-technical end-users. As such, domain scientists using this solution need not build their own software and specify dependencies, which helps prevent the notebooks they have developed from getting broken by the next software release. This talk explores the new architecture and applications of this solution to CyberGISX [3] and CyberGIS-Jupyter for Water (CJW) [8].
- Published
- 2021
- Full Text
- View/download PDF