Back to Search
Start Over
Silhouette: Efficient Cloud Configuration Exploration for Large-Scale Analytics
- Source :
- IEEE Transactions on Parallel and Distributed Systems. 32:2049-2061
- Publication Year :
- 2021
- Publisher :
- Institute of Electrical and Electronics Engineers (IEEE), 2021.
-
Abstract
- Choosing the best cloud configuration for large-scale data analytics jobs deployed in the cloud can substantially improve their performance and reduce costs. However, current cloud providers offer a wide variety of instance types and customized cluster sizes, making it both time-consuming and costly to pinpoint the optimal cloud configuration. This article presents the design, implementation, and evaluation of Silhouette , a cloud configuration selection framework based on performance models for various large-scale analytics jobs with minimal training overhead. The essence of Silhouette is to build performance prediction models with carefully selected small-scale experiments on small subsets of input data to estimate the performance with entire input data on larger cluster sizes. To reduce the training time and cost, Silhouette incorporates new statistical techniques to select those experiments that yield the best possible information for performance prediction. Moreover, we develop a novel model transformer to convert a prediction model built on one instance type to a different instance type with only one extra experiment, which significantly reduces the training overhead. We evaluate Silhouette with an extensive array of large-scale data analytics jobs on Amazon EC2. Our experimental results have shown convincing evidence that Silhouette is effective in optimizing cloud configuration while saving both training time and costs compared with existing solutions.
- Subjects :
- 020203 distributed computing
business.industry
Computer science
Cloud computing
02 engineering and technology
computer.software_genre
Silhouette
Data modeling
Computational Theory and Mathematics
Hardware and Architecture
Analytics
Signal Processing
0202 electrical engineering, electronic engineering, information engineering
Performance prediction
Data analysis
Overhead (computing)
Data mining
business
computer
Transformer (machine learning model)
Subjects
Details
- ISSN :
- 21619883 and 10459219
- Volume :
- 32
- Database :
- OpenAIRE
- Journal :
- IEEE Transactions on Parallel and Distributed Systems
- Accession number :
- edsair.doi...........ce8390c537795a7c7c58c7649c877b31
- Full Text :
- https://doi.org/10.1109/tpds.2021.3058165