Back to Search Start Over

Mining Weighted Frequent Itemsets without Candidate Generation in Uncertain Databases

Authors :
Wensheng Gan
Jerry Chun-Wei Lin
Philippe Fournier-Viger
Tzung-Pei Hong
Han-Chieh Chao
Source :
International Journal of Information Technology & Decision Making. 16:1549-1579
Publication Year :
2017
Publisher :
World Scientific Pub Co Pte Lt, 2017.

Abstract

Frequent itemset mining (FIM) is a fundamental set of techniques used to discover useful and meaningful relationships between items in transaction databases. In recent decades, extensions of FIM such as weighted frequent itemset mining (WFIM) and frequent itemset mining in uncertain databases (UFIM) have been proposed. WFIM considers that items may have different weight/importance. It can thus discover itemsets that are more useful and meaningful by ignoring irrelevant itemsets with lower weights. UFIM takes into account that data collected in a real-life environment may often be inaccurate, imprecise, or incomplete. Recently, these two ideas have been combined in the HEWI-Uapriori algorithm. This latter considers both item weights and transaction uncertainty to mine the high expected weighted itemsets (HEWIs) using a two-phase Apriori-based approach. Although the upper-bound proposed in HEWI-Uapriori can reduce the size of the search space, it still generates a large amount of candidates and uses a level-wise search. In this paper, a more efficient algorithm named HEWI-Utree is developed to efficiently mine HEWIs without performing multiple database scans and without generating candidates. This algorithm relies on three novel structures named element (E)-table, weighted-probability (WP)-table and WP-tree to maintain the information required for identifying and pruning unpromising itemsets early. Experimental results show that the proposed algorithm is generally much more efficient than traditional methods for WFIM and UFIM, as well as the state-of-the-art HEWI-Uapriori algorithm, in terms of runtime, memory consumption, and scalability.

Details

ISSN :
17936845 and 02196220
Volume :
16
Database :
OpenAIRE
Journal :
International Journal of Information Technology & Decision Making
Accession number :
edsair.doi...........1eba01a75347c6869b8a33015065986c
Full Text :
https://doi.org/10.1142/s0219622017500341