Back to Search Start Over

Optimizing Network Transfers for Data Analytic Jobs Across Geo-Distributed Datacenters.

Authors :
Chen, Li
Liu, Shuhao
Li, Baochun
Source :
IEEE Transactions on Parallel & Distributed Systems. Feb2022, Vol. 33 Issue 2, p403-414. 12p.
Publication Year :
2022

Abstract

It has become a recent trend that large volumes of data are generated, stored, and processed across geographically distributed datacenters. When popular data parallel frameworks, such as MapReduce and Spark, are employed to process such geo-distributed data, optimizing the network transfer in communication stages becomes increasingly crucial to application performance, as the inter-datacenter links have much lower bandwidth than intra-datacenter links. In this article, we focus on exploiting the flexibility of multi-path routing for inter-datacenter flows of data analytic jobs, with the hope of better utilizing inter-datacenter links and thus improve job performance. We design an optimal multi-path routing and scheduling strategy to achieve the best possible network performance for all concurrent jobs, based on our formulation of an optimization problem that can be transformed into an equivalent linear programming (LP) problem to be efficiently solved. As a highlight of this article, we have implemented our proposed algorithm in the controller of an application-layer software-defined inter-datacenter overlay testbed, designed to provide transfer optimization service for Spark jobs. With extensive evaluations of our real-world implementation on Google Cloud, we have shown convincing evidence that our optimal multi-path routing and scheduling strategies have achieved significant improvements in terms of job performance. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
10459219
Volume :
33
Issue :
2
Database :
Academic Search Index
Journal :
IEEE Transactions on Parallel & Distributed Systems
Publication Type :
Academic Journal
Accession number :
153095246
Full Text :
https://doi.org/10.1109/TPDS.2021.3093232