Back to Search Start Over

Distributed load balancing frequent colossal closed itemset mining algorithm for high dimensional dataset

Authors :
Nagamma Patil
Manjunath K Vanahalli
Source :
Journal of Parallel and Distributed Computing. 144:136-152
Publication Year :
2020
Publisher :
Elsevier BV, 2020.

Abstract

The focus of extracting colossal closed itemsets from high dimensional biological datasets has been great in recent times. A massive set of short and average sized mined itemsets do not confine complete and valuable information for decision making. But, the traditional itemset mining algorithms expend a gigantic measure of time in mining a massive set of short and average sized itemsets. The greater interest of research in the field of bioinformatics and the abundant data across the variety of domains paved the way for the generation of the high dimensional dataset. These datasets are depicted by an extensive number of features and a smaller number of rows. Colossal closed itemsets are very significant for numerous applications including the field of bioinformatics and are influential during the decision making. Extracting a huge amount of information and knowledge from the high dimensional dataset is a nontrivial task. The existing colossal closed itemsets mining algorithms for the high dimensional dataset are sequential and computationally expensive. Distributed and parallel computing is a good strategy to overcome the inefficiency of the existing sequential algorithm. Balanced Distributed Parallel Frequent Colossal Closed Itemset Mining (BDPFCCIM) algorithm is designed for high dimensional datasets. An efficient closeness checking method to check the closeness of the rowset and an efficient pruning strategy to snip the row enumeration mining search space is enclosed with the proposed BDPFCCIM algorithm. The proposed BDPFCCIM algorithm is the first distributed load balancing algorithm to mine frequent colossal closed itemsets from high dimensional biological datasets. The experimental results demonstrate the efficient performance of the proposed BDPFCCIM algorithm in comparison with the state-of-the-art algorithms.

Details

ISSN :
07437315
Volume :
144
Database :
OpenAIRE
Journal :
Journal of Parallel and Distributed Computing
Accession number :
edsair.doi...........b3687a31cf296c9aea75cc9260d411f3
Full Text :
https://doi.org/10.1016/j.jpdc.2020.05.017