Back to Search
Start Over
Supporting a Social Media Observatory with Customizable Index Structures: Architecture and Performance
- Source :
- Cloud Computing for Data-Intensive Applications ISBN: 9781493919048, Cloud Computing for Data-Intensive Applications
- Publication Year :
- 2014
- Publisher :
- Springer New York, 2014.
-
Abstract
- The intensive research activity in analysis of social media and micro-blogging data in recent years suggests the necessity and great potential of platforms that can efficiently store, query, analyze, and visualize social media data. To support these “social media observatories” effectively, a storage platform must satisfy special requirements for loading and storage of multi-terabyte datasets, as well as efficient evaluation of queries involving analysis of the text of millions of social updates. Traditional inverted indexing techniques do not meet such requirements. As a solution, we propose a general indexing framework, IndexedHBase, to build specially customized index structures for facilitating efficient queries on an HBase distributed data storage system. IndexedHBase is used to support a social media observatory that collects and analyzes data obtained through the Twitter streaming API. We develop a parallel query evaluation strategy that can explore the customized index structures efficiently, and test it on a set of typical social media data queries. We evaluate the performance of IndexedHBase on FutureGrid and compare it with Riak, a widely adopted commercial NoSQL database system. The results show that IndexedHBase provides a data loading speed that is six times faster than Riak and is significantly more efficient in evaluating queries involving large result sets.
Details
- ISBN :
- 978-1-4939-1904-8
- ISBNs :
- 9781493919048
- Database :
- OpenAIRE
- Journal :
- Cloud Computing for Data-Intensive Applications ISBN: 9781493919048, Cloud Computing for Data-Intensive Applications
- Accession number :
- edsair.doi...........eb7fc5c4c2245f620d8d310e0559ff59