Back to Search Start Over

Supporting a Social Media Observatory with Customizable Index Structures: Architecture and Performance

Authors :
Andrew J. Younge
Evan Roth
Judy Qiu
Xiaoming Gao
Clayton A. Davis
Filippo Menczer
Emilio Ferrara
Karissa McKelvey
Source :
Cloud Computing for Data-Intensive Applications ISBN: 9781493919048, Cloud Computing for Data-Intensive Applications
Publication Year :
2014
Publisher :
Springer New York, 2014.

Abstract

The intensive research activity in analysis of social media and micro-blogging data in recent years suggests the necessity and great potential of platforms that can efficiently store, query, analyze, and visualize social media data. To support these “social media observatories” effectively, a storage platform must satisfy special requirements for loading and storage of multi-terabyte datasets, as well as efficient evaluation of queries involving analysis of the text of millions of social updates. Traditional inverted indexing techniques do not meet such requirements. As a solution, we propose a general indexing framework, IndexedHBase, to build specially customized index structures for facilitating efficient queries on an HBase distributed data storage system. IndexedHBase is used to support a social media observatory that collects and analyzes data obtained through the Twitter streaming API. We develop a parallel query evaluation strategy that can explore the customized index structures efficiently, and test it on a set of typical social media data queries. We evaluate the performance of IndexedHBase on FutureGrid and compare it with Riak, a widely adopted commercial NoSQL database system. The results show that IndexedHBase provides a data loading speed that is six times faster than Riak and is significantly more efficient in evaluating queries involving large result sets.

Details

ISBN :
978-1-4939-1904-8
ISBNs :
9781493919048
Database :
OpenAIRE
Journal :
Cloud Computing for Data-Intensive Applications ISBN: 9781493919048, Cloud Computing for Data-Intensive Applications
Accession number :
edsair.doi...........eb7fc5c4c2245f620d8d310e0559ff59