Back to Search Start Over

OATS: online aggregation with two-level sharing strategy in cloud.

Authors :
Wang, Yuxiang
Luo, Junzhou
Song, Aibo
Dong, Fang
Source :
Distributed & Parallel Databases; Dec2014, Vol. 32 Issue 4, p467-505, 39p
Publication Year :
2014

Abstract

Online aggregation (OLA) is an attractive sampling-based technology to response aggregation queries by an approximate estimate to the final result, with the confidence interval becomes tighter over time. It has been built into the MapReduce-based cloud system for big data analytics, which allows users to monitor the query progress, and save money by killing the computation early once sufficient accuracy has been obtained. However, there is a serious limitation that restricts the performance of OLA that is the sharing issue of multiple OLA queries processing. Note that, in the original MapReduce paradigm, each query is processed independently without considering the potential sharing opportunities, leading to two major unnecessary additional execution costs: (1) the large redundant I/O cost, and (2) the replicative statistical computation cost. To eliminate such additional execution cost and improve the overall performance, we present online aggregation with two-level sharing strategy in cloud (OATS) based on MapReduce framework in this paper to effectively support online aggregation for large scale concurrent query processing in skewed data distribution. In the first-level sharing, we propose a sample buffer management mechanism to share the sampling opportunities among multiple OLA queries to reduce redundant I/O cost. While in the second-level sharing, we propose a heuristic algorithm (with a good scalability for large input) for the statistical computation to share partial statistics calculation to decrease the number of final aggregation operations, reducing the statistical computation cost. Based on such two-level sharing strategy, we have implemented OATS in Hadoop and conducted an extensive experiments study on the TPC-H benchmark for skewed data distribution. Our results demonstrate the efficiency and effectiveness of OATS. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
09268782
Volume :
32
Issue :
4
Database :
Complementary Index
Journal :
Distributed & Parallel Databases
Publication Type :
Academic Journal
Accession number :
98055486
Full Text :
https://doi.org/10.1007/s10619-014-7141-2