1. LSH-Based Probabilistic Pruning of Inverted Indices for Sets and Ranked Lists
- Author
-
Koninika Pal and Sebastian Michel
- Subjects
Optimization problem ,Index (economics) ,Computer science ,business.industry ,Nearest neighbor search ,InformationSystems_INFORMATIONSTORAGEANDRETRIEVAL ,Probabilistic logic ,02 engineering and technology ,computer.software_genre ,Machine learning ,Locality-sensitive hashing ,Principal variation search ,020204 information systems ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Probabilistic analysis of algorithms ,Data mining ,Artificial intelligence ,Pruning (decision trees) ,business ,computer - Abstract
We address the problem of index pruning without compromising the quality of ad-hoc similarity search among sets and ranked lists. We discuss three different ways to prune the index structure and, by linking the index structure with the concept of Locality Sensitive Hashing (LSH), we introduce two solutions to query processing over the pruned index. Through a probabilistic analysis we ensure that a user-defined recall goal is still guaranteed. We are able to formulate an optimization problem that can determine the optimal pruning factor for all three pruning methods. The experimental evaluations over real-world data validate that the optimal pruning factor indeed ensures the recall goal without any significant effect on the quality of similarity search on a much smaller index.
- Published
- 2017
- Full Text
- View/download PDF