Back to Search Start Over

Deister: A light-weight autonomous block management in data-intensive file systems using deterministic declustering distribution.

Authors :
Wang, Jun
Zhang, Xuhong
Zhang, Junyao
Yin, Jiangling
Han, Dezhi
Wang, Ruijun
Huang, Dan
Source :
Journal of Parallel & Distributed Computing. Oct2017, Vol. 108, p3-13. 11p.
Publication Year :
2017

Abstract

During the last few decades, Data-intensive File Systems (DiFS), such as Google File System (GFS) and Hadoop Distributed File System (HDFS) have become the key storage architectures for big data processing. These storage systems usually divide files into fixed-sized blocks (or chunks). Each block is replicated (usually three-way) and distributed pseudo-randomly across the cluster. The master node (namenode) uses a huge table to record the locations of each block and its replicas. However, with the increasing size of the data, the block location table and its corresponding maintenance could occupy more than half of the memory space and 30% of processing capacity in master node, which severely limit the scalability and performance of master node. We argue that the physical data distribution and maintenance should be separated out from the metadata management and performed by each storage node autonomously. In this paper, we propose Deister, a novel block management scheme that is built on an invertible deterministic declustering distribution method called Intersected Shifted Declustering (ISD). Deister is amendable to current research on scaling the namespace management in master node. In Deister, the huge table for maintaining the block locations in the master node is eliminated and the maintenance of the block-node mapping is performed autonomously on each data node. Results show that as compared with the HDFS default configuration, Deister is able to achieve identical performance with a saving of about half of the RAM space and 30% of processing capacity in master node and is expected to scale to double the size of current single namenode HDFS cluster, pushing the scalability bottleneck of master node back to namespace management. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
07437315
Volume :
108
Database :
Academic Search Index
Journal :
Journal of Parallel & Distributed Computing
Publication Type :
Academic Journal
Accession number :
123530416
Full Text :
https://doi.org/10.1016/j.jpdc.2016.03.005