A distributed frequent itemset mining algorithm using Spark for Big Data analytics.

Authors :: Zhang, Feng
Liu, Min
Gui, Feng
Shen, Weiming
Shami, Abdallah
Ma, Yunlong
Source :: Cluster Computing. Dec2015, Vol. 18 Issue 4, p1493-1501. 9p.
Publication Year :: 2015
Abstract: Frequent itemset mining is an essential step in the process of association rule mining. Conventional approaches for mining frequent itemsets in big data era encounter significant challenges when computing power and memory space are limited. This paper proposes an efficient distributed frequent itemset mining algorithm (DFIMA) which can significantly reduce the amount of candidate itemsets by applying a matrix-based pruning approach. The proposed algorithm has been implemented using Spark to further improve the efficiency of iterative computation. Numeric experiment results using standard benchmark datasets by comparing the proposed algorithm with the existing algorithm, parallel FP-growth, show that DFIMA has better efficiency and scalability. In addition, a case study has been carried out to validate the feasibility of DFIMA. [ABSTRACT FROM AUTHOR]

Subjects :: *BIG data
*DISTRIBUTED algorithms
*DATA mining
*APRIORI algorithm
*ITERATIVE methods (Mathematics)

Full Text Access

Tools