1. A Novel Pipeline Approach for Efficient Big Data Broadcasting
- Author
-
Chi-Jen Wu, Chin-Fu Ku, Jan-Ming Ho, and Ming-Syan Chen
- Subjects
Distributed database ,business.industry ,Computer science ,Distributed computing ,Node (networking) ,Pipeline (computing) ,020206 networking & telecommunications ,Cloud computing ,02 engineering and technology ,Broadcasting ,Computer Science Applications ,Data modeling ,Upload ,Computational Theory and Mathematics ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,business ,Information Systems - Abstract
Big-data computing is a new critical challenge for the ICT industry. Engineers and researchers are dealing with data sets of petabyte scale in the cloud computing paradigm. Thus, the demand for building a service stack to distribute, manage, and process massive data sets has risen drastically. In this paper, we investigate the Big Data Broadcasting problem for a single source node to broadcast a big chunk of data to a set of nodes with the objective of minimizing the maximum completion time. These nodes may locate in the same datacenter or across geo-distributed datacenters. This problem is one of the fundamental problems in distributed computing and is known to be NP-hard in heterogeneous environments. We model the Big-data broadcasting problem into a LockStep Broadcast Tree (LSBT) problem. The main idea of the LSBT model is to define a basic unit of upload bandwidth, $r$ , such that a node with capacity $c$ broadcasts data to a set of $\lfloor c/r\rfloor$ children at the rate $r$ . Note that $r$ is a parameter to be optimized as part of the LSBT problem. We further divide the broadcast data into $m$ chunks. These data chunks can then be broadcast down the LSBT in a pipeline manner. In a homogeneous network environment in which each node has the same upload capacity $c$ , we show that the optimal uplink rate $r^*$ of LSBT is either $c/2$ or $c/3$ , whichever gives the smaller maximum completion time. For heterogeneous environments, we present an $O(n \text{log}^2 n)$ algorithm to select an optimal uplink rate $r^*$ and to construct an optimal LSBT. Numerical results show that our approach performs well with less maximum completion time and lower computational complexity than other efficient solutions in literature.
- Published
- 2016