1. An Overview of the Apache Airavata Software Stack for Science Gateways
- Author
-
Pierce, Marlon, Abeysinghe, Eroma, Christie, Marcus, Coulter, Eric, Marru, Suresh, Pamidighantam, Sudhakar, Quick, Rob, Ranawaka, Isuru, Wang, Jun, and Wannipurage, Dimuthu
- Subjects
Cybersecurity ,Managed file transfer ,Science gateways ,Scientific workflows ,Web portal ,Open source software - Abstract
Tutorial length: 90 minutes Skill level: Any Technology requirements: None Since its inception in the Apache Software Foundation in 2011, Apache Airavata has evolved from a middleware system for supporting science gateway workflow executions to a comprehensive set of semi-autonomous subsystems that can be used to provide solutions for a wide range of science gateways. This tutorial provides a series of lightning overviews of each of these major subsystems and illustrates their usage in different science gateways. The Virtual Cluster System provides a mechanism for creating dynamic virtual clusters on OpenStack-based clouds. These virtual clusters can be used to execute both containerized serial and parallel scientific applications, providing users and gateways with their own private clusters. They can also be deployed with the JupyterHub interface, providing on-demand access to JupyterLab servers. Apache Airavata’s metadata and workflow scheduling infrastructure (the original core of Apache Airavata) builds on Apache Helix and Airavata’s own metadata management system to manage the full lifecycle for job executions, capturing the metadata needed to audit and reproduce execution outcomes. The Airavata Django Portal provides an out-of-the-box end user environment for all of the Apache Airavata middleware subsystems. Through the use of the Wagtail Content Management System and the Django Apps extension mechanism, the Airavata Django Portal can be extensively customized to create unique user interfaces that meet the usability requirements of different research communities. Airavata Custos encompasses Apache Airavata’s security services for managing user accounts; federated authentication; role, group, and attribute-based authorization; sharing and permissions; and resource credential (secrets) management. Custos services can be used independently of other Airavata services and can be integrated into other science gateway platforms such as Galaxy through the Custos API. Airavata Managed File Transfer (MFT) subsystem supports data transfer and storage endpoint management for users’ local storage systems, parallel file and mass storage systems operated by research computing systems, and cloud storage systems such as Amazon S3, Google Drive, and Box. Central MFT services and locally deployed agents can support emerging high performance transfer protocols and provide optimized transfers that are decoupled from gateway middleware. Airavata Data Lake provides secured, controlled access to data from a wide range of sources including scientific instruments, results of computations, and user and machine annotated metadata. The Airavata Data Lake system can orchestrate data movements managed by Airavata MFT and execute data pipelines to extract searchable metadata. Collectively these components enable processing of data, the movement of data from the data sources to central storage points, and distribution of data to respective authorized users. The Science Gateways Platform as a Service (SciGaP) is an operational deployment of the Airavata software stack that is run by the Indiana University Cyberinfrastructure Integration Research Center for over 40 client gateways. We conclude the tutorial with a discussion of future directions for the Apache Airavata software stack and gateways in general, including greater support for FAIR science and secure integration of a greater number of edge systems.
- Published
- 2021
- Full Text
- View/download PDF