Back to Search Start Over

A distributed frequent itemset mining algorithm using Spark for Big Data analytics.

Authors :
Zhang, Feng
Liu, Min
Gui, Feng
Shen, Weiming
Shami, Abdallah
Ma, Yunlong
Source :
Cluster Computing. Dec2015, Vol. 18 Issue 4, p1493-1501. 9p.
Publication Year :
2015

Abstract

Frequent itemset mining is an essential step in the process of association rule mining. Conventional approaches for mining frequent itemsets in big data era encounter significant challenges when computing power and memory space are limited. This paper proposes an efficient distributed frequent itemset mining algorithm (DFIMA) which can significantly reduce the amount of candidate itemsets by applying a matrix-based pruning approach. The proposed algorithm has been implemented using Spark to further improve the efficiency of iterative computation. Numeric experiment results using standard benchmark datasets by comparing the proposed algorithm with the existing algorithm, parallel FP-growth, show that DFIMA has better efficiency and scalability. In addition, a case study has been carried out to validate the feasibility of DFIMA. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
13867857
Volume :
18
Issue :
4
Database :
Academic Search Index
Journal :
Cluster Computing
Publication Type :
Academic Journal
Accession number :
110952182
Full Text :
https://doi.org/10.1007/s10586-015-0477-1