1. Portable, Scalable, and Reproducible Scientific Computing: from Cloud to HPC
- Author
-
Stubbs, Joe, Jamthe, Anagha, Black, Steve, Cleveland, Sean, and Looney, Julia
- Subjects
Machine Learning ,Docker ,Singularity ,Jupyter ,Tapis - Abstract
This tutorial will focus on providing attendees exposure to state-of-the-art techniques for portable, reproducible research computing, enabling them to easily transport analyses from cloud to HPC resources, share computations with collaborators and disseminate final results to communities of interest. We will introduce various open source technologies, including Jupyter, Docker and Singularity, and show how to utilize these tools within the NSF-funded Tapis v3 platform, an Application Program Interface (API) for distributed computation. After a brief introduction to the open source technologies above, this tutorial will be focused on hands-on exercises in which the attendees will build a portable analysis that can be seamlessly moved to different execution environments, including a small virtual machine and a national-scale supercomputer. Using techniques covered in the tutorial, attendees will also be able to easily share their results with one or more additional users. The tutorial will make use of a specific machine learning image classifier analysis to illustrate the concepts, but the techniques introduced can be applied to a broad class of analyses in virtually any domain of science or engineering. Description and Format: TACC training accounts will be set up for all registered attendees, which will have access to allocations on XSEDE cloud systems and one or more HPC resources such as TACC’s Stampede2 or Frontera. The tutorials will be hands-on exercises, where the attendees will interact with the Tapis v3 services within a Jupyter notebook. Registered attendees will be notified with their account details closer to the tutorial date. All the course materials will be published on github pages so the attendees will have access to them during and after the tutorial. We will have enough proctors throughout the session, who will help attendees through slack or breakout sessions. Proposed tutorial schedule is as shown in Table 1. Learning Outcomes: In this tutorial, attendees will gain an understanding of the concepts of using container technology (Docker, Singularity) for portable analysis, programmatically executing analyses in both Cloud and HPC environments using an API, interacting with and visualizing the results in Jupyter notebooks and sharing results with collaborators. By the end of this workshop attendees will be able to: • Have a basic understanding of Docker and Singularity containers in relation to computational research. • Use Tapis to access HPC storage and compute resources in a programmatic and reproducible way. • Utilize Jupyter notebooks for interactive computing. • Use Tapis to share results with others. Content Level and Length: Beginner 70%, Intermediate 30% 3 hours. Audience Prerequisites: Basic familiarity with Jupyter notebooks and Python will be helpful. Attendees must use their own laptop for the hands-on part of the tutorial.
- Published
- 2021
- Full Text
- View/download PDF