Back to Search
Start Over
Distributed load balancing frequent colossal closed itemset mining algorithm for high dimensional dataset
- Source :
- Journal of Parallel and Distributed Computing. 144:136-152
- Publication Year :
- 2020
- Publisher :
- Elsevier BV, 2020.
-
Abstract
- The focus of extracting colossal closed itemsets from high dimensional biological datasets has been great in recent times. A massive set of short and average sized mined itemsets do not confine complete and valuable information for decision making. But, the traditional itemset mining algorithms expend a gigantic measure of time in mining a massive set of short and average sized itemsets. The greater interest of research in the field of bioinformatics and the abundant data across the variety of domains paved the way for the generation of the high dimensional dataset. These datasets are depicted by an extensive number of features and a smaller number of rows. Colossal closed itemsets are very significant for numerous applications including the field of bioinformatics and are influential during the decision making. Extracting a huge amount of information and knowledge from the high dimensional dataset is a nontrivial task. The existing colossal closed itemsets mining algorithms for the high dimensional dataset are sequential and computationally expensive. Distributed and parallel computing is a good strategy to overcome the inefficiency of the existing sequential algorithm. Balanced Distributed Parallel Frequent Colossal Closed Itemset Mining (BDPFCCIM) algorithm is designed for high dimensional datasets. An efficient closeness checking method to check the closeness of the rowset and an efficient pruning strategy to snip the row enumeration mining search space is enclosed with the proposed BDPFCCIM algorithm. The proposed BDPFCCIM algorithm is the first distributed load balancing algorithm to mine frequent colossal closed itemsets from high dimensional biological datasets. The experimental results demonstrate the efficient performance of the proposed BDPFCCIM algorithm in comparison with the state-of-the-art algorithms.
- Subjects :
- Computer Networks and Communications
Computer science
Closeness
InformationSystems_DATABASEMANAGEMENT
020206 networking & telecommunications
02 engineering and technology
High dimensional
Load balancing (computing)
computer.software_genre
Data mining algorithm
Theoretical Computer Science
ComputingMethodologies_PATTERNRECOGNITION
Artificial Intelligence
Hardware and Architecture
0202 electrical engineering, electronic engineering, information engineering
020201 artificial intelligence & image processing
Data mining
computer
Row
Computer Science::Databases
Software
Sequential algorithm
Subjects
Details
- ISSN :
- 07437315
- Volume :
- 144
- Database :
- OpenAIRE
- Journal :
- Journal of Parallel and Distributed Computing
- Accession number :
- edsair.doi...........b3687a31cf296c9aea75cc9260d411f3
- Full Text :
- https://doi.org/10.1016/j.jpdc.2020.05.017