Back to Search Start Over

A Transformation-Based Framework for KNN Set Similarity Search.

Authors :
Zhang, Yong
Wu, Jiacheng
Wang, Jin
Xing, Chunxiao
Source :
IEEE Transactions on Knowledge & Data Engineering. Mar2020, Vol. 32 Issue 3, p409-423. 15p.
Publication Year :
2020

Abstract

Set similarity search is a fundamental operation in a variety of applications. While many previous studies focus on threshold based set similarity search and join, few efforts have been paid for KNN set similarity search. In this paper, we propose a transformation based framework to solve the problem of KNN set similarity search, which given a collection of set records and a query set, returns $k$ k results with the largest similarity to the query. We devise an effective transformation mechanism to transform sets with various lengths to fixed length vectors which can map similar sets closer to each other. Then, we index such vectors with a tiny tree structure. Next, we propose efficient search algorithms and pruning strategies to perform exact KNN set similarity search. We also design an estimation technique by leveraging the data distribution to support approximate KNN search, which can speed up the search while retaining high recall. Experimental results on real world datasets show that our framework significantly outperforms state-of-the-art methods in both memory and disk based settings. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
10414347
Volume :
32
Issue :
3
Database :
Academic Search Index
Journal :
IEEE Transactions on Knowledge & Data Engineering
Publication Type :
Academic Journal
Accession number :
141599657
Full Text :
https://doi.org/10.1109/TKDE.2018.2886189