Back to Search Start Over

A Generic Method for Accelerating LSH-Based Similarity Join Processing.

Authors :
Yu, Chenyun
Nutanong, Sarana
Li, Hangyu
Wang, Cong
Yuan, Xingliang
Source :
IEEE Transactions on Knowledge & Data Engineering. Apr2017, Vol. 29 Issue 4, p712-726. 15p.
Publication Year :
2017

Abstract

Locality sensitive hashing (LSH) is an efficient method for solving the problem of approximate similarity search in high-dimensional spaces. Through LSH, a high-dimensional similarity join can be processed in the same way as hash join, making the cost of joining two large datasets linear. By judicially analyzing the properties of multiple LSH algorithms, we propose a generic method to speed up the process of joining two large datasets using LSH. The crux of our method lies in the way which we identify a set of representative points to reduce the number of LSH lookups. Theoretical analyzes show that our proposed method can greatly reduce the number of lookup operations and retain the same result accuracy compared to executing LSH lookups for every query point. Furthermore, we demonstrate the generality of our method by showing that the same principle can be applied to LSH algorithms for three different metrics: the Euclidean distance (QALSH), Jaccard similarity measure (MinHash), and Hamming distance (sequence hashing). Results from experimental studies using real datasets confirm our error analyzes and show significant improvements of our method over the state-of-the-art LSH method: to achieve over 0.95 recall, we only need to operate LSH lookups for at most 15 percent of the query points. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
10414347
Volume :
29
Issue :
4
Database :
Academic Search Index
Journal :
IEEE Transactions on Knowledge & Data Engineering
Publication Type :
Academic Journal
Accession number :
121767229
Full Text :
https://doi.org/10.1109/TKDE.2016.2638838