Jamthe, Anagha, Stubbs, Joe, Packard, Mike, Chuah, Joon Yee, Looney, Julia, and Curbelo, Gilbert
Over the last 15 years, a great investment has been made in providing web access to advanced computing resources for computational research. Historically, such web applications, or “science gateways”, have enabled users to run analyses asynchronously on remote systems. More recently, as a growing number of disciplines bring big data techniques to bear on fundamental problems, interactive computing modes such as Jupyter Notebooks have gained tremendous popularity for the ease within which one can perform a range of computational tasks in real time, including data cleansing, analysis, visualization, and post-processing. Nevertheless, to date there is not a national-scale offering that provides production-grade, scalable interactive computing that integrates deeply into the academic cyberinfrastructure (CI) ecosystem. The Texas Advanced Computing Center recently launched the Scientific and Interactive Computing (SCINCO) project to provide a hardened, production-grade Jupyter-notebooks-as-a-service platform capable of utilizing advanced storage and computing CI. Launched in 2020, SCINCO builds upon a custom JupyterHub offering developed at TACC since 2015. TACC’s custom JupyterHub supports more than 1600 users running across five different clusters at TACC. It has become a crucial component of independently funded gateway projects such as: DesignSafe Cyber-Infrastructure, Synergistic Data Discovery Environment, 3DEM and HETDEX. Researchers from Astronomy, Biology, Climate Science, Neuroscience, etc. are leveraging SCINCO to analyze big data, implement computational models, disseminate results and train researchers. SCINCO tackles scalability challenges faced by TACC’s original JupyterHub by combining state-of-the-art open source container technologies such as Kubernetes with a customizable JupyterHub to deliver a platform that is capable of serving dozens or even hundreds of projects (i.e., “tenants”) with a minimal developer overhead. SCINCO executes notebook containers across a shared Kubernetes cluster running on bare-metal that uses namespaces for isolating different projects from each other. SCINCO inherits all the advantages of Kubernetes including scalability, reproducibility and portability while significantly reducing the administrative overhead associated with managing servers. For example, TACC’s original JupyterHub utilized thirty 16GB virtual machines arranged into 5 different clusters; SCINCO can provide the same computational power with better load balancing using just 2 bare metal, physical servers with 256 GB of RAM each. Customizations (Figure 1) made within SCINCO support user-ID and group-ID lookup for an associated username, when launching a user’s notebook server. Additionally, the SCINCO design dynamically determines which persistent volumes the user has access to at runtime. SCINCO also allows for dynamic configuration changes at runtime, such as adding or removing volume mounts for a specific user group, notebook images, admin users, etc. The SCINCO project is currently developing an administrative portal, which will empower project administrators to manage their own JupyterHub clusters without direct support from the project staff. Project admins will get access to a dashboard where they can manage users, custom images, and volume mounts for individual users. They will be able to start, stop and start a shell session on the user's notebook servers. The SCINCO project made its initial production release in July of 2021 and the administrative portal will be released for early adopters in Fall 2021.