1. Democratizing Access to Data with OneDataShare
- Author
-
Jacob Goldverg, Elvis Rodrigues, Hasibul Jamil, and Tevfik Kosar
- Subjects
Managed file transfer ,big data ,cloud computing ,throughput optimization ,data management ,Gateways 2022 ,protocol translation ,Gateways22 - Abstract
Today the science communities are facing the issue of having diverse, distributed, and large volumes of data that is a big challenge to access and move over Wide Area Networks (WAN) while using standard utilities. The challenges include heterogeneity of data storage end-systems, non-interoperable data transfer protocols, highly fluctuating shared network links, frequent network outages, difficulty in optimally setting the tunable transfer parameters, accurate prediction of data delivery time, the reliability and security of the file transfers, efficient use of end-system and network resource, fairness of all users accessing the same set of resources, and hiding all of these complexities from the end users. As the need for remote data access and transfer grows, so does the impact of these issues on the science communities who depend on these data sets for their research. OneDataShare (ODS) is a cloud-hosted managed file transfer service that aims to overcome these challenges. It provides (1) optimization of end-to-end data transfers and reduction of the time to delivery of the data; (2) interoperation across heterogeneous data resources and on-the-fly inter-protocol translation; (3) an intuitive web interface that makes file transfer and monitoring very easy from any device and location; and (4) a reliable and secure file transfer service which is open source and free-to-use to democratize access to data. ODS was initially developed as a monolithic Java ODS was initially developed as a monolithic Java application that contained all of its features as a SaaS user could experience [2]. The proposed SaaS architecture could not accommodate users installing the Transfer-Service (TS) on their hosts to enable direct access to the file system and get around low-privileged users. The benefit of this is the ability to deploy the Transfer-Service on the users’ hosts giving it direct access to the file system and more efficient utilization of system resources for the user’s data transfers, as it connects to the ODS back-end and uses the monitoring and optimization service. To address the above challenges, ODS pivoted to a micro-services-based architecture that focuses on maximizing the throughput of heterogeneous file transfers, real-time thread tuning depending on end systems resources usage, retry capabilities, and providing encryption at rest and in transit for all credentials.
- Published
- 2022
- Full Text
- View/download PDF