Back to Search Start Over

Compact Indexing and Judicious Searching for Billion-Scale Microblog Retrieval

Authors :
Dongxiang Zhang
Tat-Seng Chua
Liqiang Nie
Heng Tao Shen
Huanbo Luan
Kian-Lee Tan
Source :
ACM Transactions on Information Systems. 35:1-24
Publication Year :
2017
Publisher :
Association for Computing Machinery (ACM), 2017.

Abstract

In this article, we study the problem of efficient top- k disjunctive query processing in a huge microblog dataset. In terms of compact indexing, we categorize the keywords into rare terms and common terms based on inverse document frequency (idf) and propose tailored block-oriented organization to save memory consumption. In terms of fast searching, we classify the queries into three types based on term category and judiciously design an efficient search algorithm for each type. We conducted extensive experiments on a billion-scale Twitter dataset and examined the performance with both simple and more advanced ranking functions. The results showed that with much smaller index size, our search algorithm achieves a factor of 2--3 times faster speedup over state-of-the-art solutions in both ranking scenarios.

Details

ISSN :
15582868 and 10468188
Volume :
35
Database :
OpenAIRE
Journal :
ACM Transactions on Information Systems
Accession number :
edsair.doi...........23386cd57787ff08013bd208a8ed743d
Full Text :
https://doi.org/10.1145/3052771