Back to Search Start Over

Finding Similar Documents Using Frequent Pattern Mining Methods.

Authors :
Sohrabi, Mohammad Karim
Azgomi, Hossein
Source :
International Journal of Uncertainty, Fuzziness & Knowledge-Based Systems. Feb 2019, Vol. 27 Issue 1, p73-96. 24p.
Publication Year :
2019

Abstract

Various problems are just rising with regard to mining in massive datasets, among which finding similar documents can be pinpointed. The Shingling method converts this problem to a set-based problem. Some of existing methods have used min-hashing to compress the results already driven from the shingling method and then have exploited LSH method to find candidate pairs for similarity search from all pairs of documents. In this paper, an apriori-based method is proposed for finding similar documents based on frequent itemset mining approach. To this end, the apriori algorithm is modified and is customized for similarity search problem. Modeling the similarity search problem as a frequent pattern mining problem, using a modified version of apriori, and dynamic selection the minimum support threshold are the most important advantages of the proposed method, which lead to its appropriate execution time and high quality results. The proposed method finds similar documents in less time than the combined method and MCVM method because it generates fewer candidate pairs for finding similar documents. Furthermore, experimental results show the high quality of the answers of the proposed methods. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
02184885
Volume :
27
Issue :
1
Database :
Academic Search Index
Journal :
International Journal of Uncertainty, Fuzziness & Knowledge-Based Systems
Publication Type :
Academic Journal
Accession number :
134824878
Full Text :
https://doi.org/10.1142/S0218488519500041