Accelerated Query Processing Via Similarity Score Prediction

Authors :: Alistair Moffat
J. Shane Culpepper
Matthias Petri
Joel Mackenzie
Daniel Beck
Source :: SIGIR
Publication Year :: 2019
Publisher :: ACM, 2019.
Abstract: Processing top-k bag-of-words queries is critical to many information retrieval applications, including web-scale search. In this work, we consider algorithmic properties associated with dynamic pruning mechanisms. Such algorithms maintain a score threshold (the k th highest similarity score identified so far) so that low-scoring documents can be bypassed, allowing fast top-k retrieval with no loss in effectiveness. In standard pruning algorithms the score threshold is initialized to the lowest possible value. To accelerate processing, we make use of term- and query-dependent features to predict the final value of that threshold, and then employ the predicted value right from the commencement of processing. Because of the asymmetry associated with prediction errors (if the estimated threshold is too high the query will need to be re-executed in order to assure the correct answer), the prediction process must be risk-sensitive. We explore techniques for balancing those factors, and provide detailed experimental results that show the practical usefulness of the new approach.

Subjects :: Computer science
05 social sciences
Process (computing)
Value (computer science)
02 engineering and technology
Information retrieval applications
Inverted index
computer.software_genre
Term (time)
Similarity (network science)
020204 information systems
0202 electrical engineering, electronic engineering, information engineering
Information system
Data mining
Pruning (decision trees)
0509 other social sciences
050904 information & library sciences
computer

Database :: OpenAIRE
Journal :: Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval
Accession number :: edsair.doi...........90f13f33770242fad290330d1a0c2b43
Full Text :: https://doi.org/10.1145/3331184.3331207