Back to Search
Start Over
A MapReduce solution for incremental mining of sequential patterns from big data.
- Source :
-
Expert Systems with Applications . Nov2019, Vol. 133, p109-125. 17p. - Publication Year :
- 2019
-
Abstract
- • Two phase MapReduce algorithm is proposed for incremental mining of sequential patterns. • Backward mining makes use of the knowledge obtained during the previous mining process. • Co-occurrence reverse map data structure efficiently generates the candidate sequences. • Candidate generation rules avoids the generation of too many false candidates. • Three novel early prune properties are introduced based on the study of item co-occurrences. Sequential Pattern Mining (SPM) is a popular data mining task with broad applications. With the advent of big data, traditional SPM algorithms are not scalable. Hence, many of the researchers have migrated to big data frameworks such as MapReduce and proposed distributed algorithms. However, the existing MapReduce algorithms assume the data as static and do not handle the incremental database updates. Moreover, they use to re-mine the updated database while new sequences are inserted. In this paper, we propose an efficient distributed algorithm for incremental sequential pattern mining (MR-INCSPM) using the MapReduce framework that can handle big data. The proposed algorithm incorporates the backward mining approach that efficiently makes use of the knowledge obtained during the previous mining process. Also, based on the study of item co-occurrences, we propose Co-occurrence Reverse Map (CRMAP) data structure. The issue of combinatorial explosion of candidate sequences is dealt using the proposed CRMAP data structure. Besides, a novel candidate generation and early prune mechanisms are designed using CRMAP to speed up the mining process. The proposed algorithm is evaluated on both the real and synthetic datasets. The experimental results prove the efficacy of MR-INCSPM with respect to processing time, memory and pruning efficiency. [ABSTRACT FROM AUTHOR]
- Subjects :
- *SEQUENTIAL pattern mining
*BIG data
*DATA mining
*DISTRIBUTED algorithms
*PRUNING
Subjects
Details
- Language :
- English
- ISSN :
- 09574174
- Volume :
- 133
- Database :
- Academic Search Index
- Journal :
- Expert Systems with Applications
- Publication Type :
- Academic Journal
- Accession number :
- 136911830
- Full Text :
- https://doi.org/10.1016/j.eswa.2019.05.013