Back to Search Start Over

Big data classification using heterogeneous ensemble classifiers in Apache Spark based on MapReduce paradigm.

Authors :
Kadkhodaei, Hamidreza
Eftekhari Moghadam, Amir Masoud
Dehghan, Mehdi
Source :
Expert Systems with Applications. Nov2021, Vol. 183, pN.PAG-N.PAG. 1p.
Publication Year :
2021

Abstract

• Distributed Heterogeneous Ensemble is designed for big data classification. • Classifiers are pruned from the ensemble to increase the diversity. • A Spark version of DHBoost is presented based on MapReduce programming paradigm. • DHBoost outperforms the state-of-the-art ensemble classifiers in the Spark library. In this era of big data, processing large scale data efficiently and accurately has become a challenging problem. Ensemble classification is a type of supervised learning that uses multiple experts to generate the final output. It provides a way to classify data more accurately. As a result of using multiple classifiers, they are often more complicated than single classifiers, especially for big data problems. Apache Spark is a unified analytics engine for big data processing which provides a scalable framework to analyze the data. In this paper, we first extend our previous work and design a distributed heterogeneous ensemble classifier inspired by the boosting approach, which is capable of dealing with big datasets. Using heterogeneous classifiers makes it possible to have more diverse classifiers, and consequently, a more accurate classifier is obtained. Then, we present the Spark version of the proposed approach to speed up our heterogeneous ensemble classifier using the MapReduce paradigm. In order to evaluate our approach, we have applied it to seven big datasets. Extensive experimental results indicate the superiority of the proposed method over the existing ensemble algorithms implemented by Spark MLlib in terms of the classification accuracy, performance, and scalability. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
09574174
Volume :
183
Database :
Academic Search Index
Journal :
Expert Systems with Applications
Publication Type :
Academic Journal
Accession number :
152187541
Full Text :
https://doi.org/10.1016/j.eswa.2021.115369