Back to Search
Start Over
Performance Optimization System for Hadoop and Spark Frameworks
- Source :
- Cybernetics and Information Technologies, Vol 20, Iss 6, Pp 5-17 (2020)
- Publication Year :
- 2020
- Publisher :
- Sciendo, 2020.
-
Abstract
- The optimization of large-scale data sets depends on the technologies and methods used. The MapReduce model, implemented on Apache Hadoop or Spark, allows splitting large data sets into a set of blocks distributed on several machines. Data compression reduces data size and transfer time between disks and memory but requires additional processing. Therefore, finding an optimal tradeoff is a challenge, as a high compression factor may underload Input/Output but overload the processor. The paper aims to present a system enabling the selection of the compression tools and tuning the compression factor to reach the best performance in Apache Hadoop and Spark infrastructures based on simulation analyzes.
Details
- Language :
- English
- ISSN :
- 13144081 and 20200056
- Volume :
- 20
- Issue :
- 6
- Database :
- Directory of Open Access Journals
- Journal :
- Cybernetics and Information Technologies
- Publication Type :
- Academic Journal
- Accession number :
- edsdoj.12889a3dee384215a31e4c53b35f54b6
- Document Type :
- article
- Full Text :
- https://doi.org/10.2478/cait-2020-0056