Back to Search Start Over

A Distributed Methodology for Imbalanced Classification Problems.

Authors :
Lemnaru, Camelia
Cuibus, Mihai
Bona, Adrian
Alic, Andrei
Potolea, Rodica
Source :
2012 11th International Symposium on Parallel & Distributed Computing; 1/ 1/2012, p164-171, 8p
Publication Year :
2012

Abstract

Current important challenges in data mining research are triggered by the need to address various particularities of real-world problems, such as imbalanced data and error cost distributions. This paper presents Distributed Evolutionary Cost-Sensitive Balancing, a distributed methodology for dealing with imbalanced data and -- if necessary -- cost distributions. The method employs a genetic algorithm to search for an optimal cost matrix and base classifier settings, which are then employed by a cost-sensitive classifier, wrapped around the base classifier. Individual fitness computation is the most intensive task in the algorithm, but it also presents a high parallelization potential. Two different parallelization alternatives have been explored: a computation-driven approach, and a data-driven approach. Both have been developed within the Apache Watchmaker framework and deployed on Hadoop-based infrastructures. Experimental evaluations performed up to this point have indicated that the computation-driven approach achieves a good classification performance, but does not reduce the running time significantly, the data-driven approach reduces the running time for slow algorithms, such as the kNN and the SVM, while still yielding important performance improvements. [ABSTRACT FROM PUBLISHER]

Details

Language :
English
ISBNs :
9781467325998
Database :
Complementary Index
Journal :
2012 11th International Symposium on Parallel & Distributed Computing
Publication Type :
Conference
Accession number :
86490202
Full Text :
https://doi.org/10.1109/ISPDC.2012.30