Back to Search Start Over

Automatic Performance Tuning for Distributed Data Stream Processing Systems

Authors :
Herodotos Herodotou
Lambros Odysseos
Yuxing Chen
Jiaheng Lu
Unified DataBase Management System research group / Jiaheng Lu
Department of Computer Science
Publication Year :
2022
Publisher :
IEEE, 2022.

Abstract

Distributed data stream processing systems (DSPSs) such as Storm, Flink, and Spark Streaming are now routinely used to process continuous data streams in (near) real-time. However, achieving the low latency and high throughput demanded by today's streaming applications can be a daunting task, especially since the performance of DSPSs highly depends on a large number of system parameters that control load balancing, degree of parallelism, buffer sizes, and various other aspects of system execution. This tutorial offers a comprehensive review of the state-of-the-art automatic performance tuning approaches that have been proposed in recent years. The approaches are organized into five main categories based on their methodologies and features: cost modeling, simulation-based, experiment-driven, machine learning, and adaptive tuning. The categories of approaches will be analyzed in depth and compared to each other, exposing their various strengths and weaknesses. Finally, we will identify several open research problems and challenges related to automatic performance tuning for DSPSs.

Details

Language :
English
Database :
OpenAIRE
Accession number :
edsair.doi.dedup.....2d2f7eed1da81537baa69c9c3c7e5121