Back to Search
Start Over
Mining Weighted Frequent Itemsets without Candidate Generation in Uncertain Databases
- Source :
- International Journal of Information Technology & Decision Making. 16:1549-1579
- Publication Year :
- 2017
- Publisher :
- World Scientific Pub Co Pte Lt, 2017.
-
Abstract
- Frequent itemset mining (FIM) is a fundamental set of techniques used to discover useful and meaningful relationships between items in transaction databases. In recent decades, extensions of FIM such as weighted frequent itemset mining (WFIM) and frequent itemset mining in uncertain databases (UFIM) have been proposed. WFIM considers that items may have different weight/importance. It can thus discover itemsets that are more useful and meaningful by ignoring irrelevant itemsets with lower weights. UFIM takes into account that data collected in a real-life environment may often be inaccurate, imprecise, or incomplete. Recently, these two ideas have been combined in the HEWI-Uapriori algorithm. This latter considers both item weights and transaction uncertainty to mine the high expected weighted itemsets (HEWIs) using a two-phase Apriori-based approach. Although the upper-bound proposed in HEWI-Uapriori can reduce the size of the search space, it still generates a large amount of candidates and uses a level-wise search. In this paper, a more efficient algorithm named HEWI-Utree is developed to efficiently mine HEWIs without performing multiple database scans and without generating candidates. This algorithm relies on three novel structures named element (E)-table, weighted-probability (WP)-table and WP-tree to maintain the information required for identifying and pruning unpromising itemsets early. Experimental results show that the proposed algorithm is generally much more efficient than traditional methods for WFIM and UFIM, as well as the state-of-the-art HEWI-Uapriori algorithm, in terms of runtime, memory consumption, and scalability.
- Subjects :
- Database
Computer science
Efficient algorithm
InformationSystems_DATABASEMANAGEMENT
02 engineering and technology
Space (commercial competition)
computer.software_genre
Set (abstract data type)
020204 information systems
Scalability
0202 electrical engineering, electronic engineering, information engineering
Computer Science (miscellaneous)
A priori and a posteriori
020201 artificial intelligence & image processing
Pruning (decision trees)
Data mining
Element (category theory)
computer
Database transaction
Subjects
Details
- ISSN :
- 17936845 and 02196220
- Volume :
- 16
- Database :
- OpenAIRE
- Journal :
- International Journal of Information Technology & Decision Making
- Accession number :
- edsair.doi...........1eba01a75347c6869b8a33015065986c
- Full Text :
- https://doi.org/10.1142/s0219622017500341