Back to Search
Start Over
Compact Indexing and Judicious Searching for Billion-Scale Microblog Retrieval
- Source :
- ACM Transactions on Information Systems. 35:1-24
- Publication Year :
- 2017
- Publisher :
- Association for Computing Machinery (ACM), 2017.
-
Abstract
- In this article, we study the problem of efficient top- k disjunctive query processing in a huge microblog dataset. In terms of compact indexing, we categorize the keywords into rare terms and common terms based on inverse document frequency (idf) and propose tailored block-oriented organization to save memory consumption. In terms of fast searching, we classify the queries into three types based on term category and judiciously design an efficient search algorithm for each type. We conducted extensive experiments on a billion-scale Twitter dataset and examined the performance with both simple and more advanced ranking functions. The results showed that with much smaller index size, our search algorithm achieves a factor of 2--3 times faster speedup over state-of-the-art solutions in both ranking scenarios.
- Subjects :
- Speedup
Information retrieval
Computer science
Microblogging
Search engine indexing
02 engineering and technology
computer.software_genre
General Business, Management and Accounting
Computer Science Applications
Ranking (information retrieval)
Term (time)
Index (publishing)
Search algorithm
020204 information systems
0202 electrical engineering, electronic engineering, information engineering
020201 artificial intelligence & image processing
Social media
Data mining
tf–idf
computer
Information Systems
Subjects
Details
- ISSN :
- 15582868 and 10468188
- Volume :
- 35
- Database :
- OpenAIRE
- Journal :
- ACM Transactions on Information Systems
- Accession number :
- edsair.doi...........23386cd57787ff08013bd208a8ed743d
- Full Text :
- https://doi.org/10.1145/3052771