Back to Search Start Over

Performance Optimization System for Hadoop and Spark Frameworks

Authors :
Astsatryan, Hrachya
Kocharyan, Aram
Hagimont, Daniel
Lalayan, Arthur
Source :
Cybernetics and Information Technologies; December 2020, Vol. 20 Issue: 6 p5-17, 13p
Publication Year :
2020

Abstract

The optimization of large-scale data sets depends on the technologies and methods used. The MapReduce model, implemented on Apache Hadoop or Spark, allows splitting large data sets into a set of blocks distributed on several machines. Data compression reduces data size and transfer time between disks and memory but requires additional processing. Therefore, finding an optimal tradeoff is a challenge, as a high compression factor may underload Input/Output but overload the processor. The paper aims to present a system enabling the selection of the compression tools and tuning the compression factor to reach the best performance in Apache Hadoop and Spark infrastructures based on simulation analyzes.

Details

Language :
English
ISSN :
13119702 and 13144081
Volume :
20
Issue :
6
Database :
Supplemental Index
Journal :
Cybernetics and Information Technologies
Publication Type :
Periodical
Accession number :
ejs55124500
Full Text :
https://doi.org/10.2478/cait-2020-0056