Optimizing small file storage process of the HDFS which based on the indexing mechanism

Authors :: Miaomiao Zhou
Wenjuan Cheng
Bing Tong
Junhong Zhu
Source :: 2017 IEEE 2nd International Conference on Cloud Computing and Big Data Analysis (ICCCBDA).
Publication Year :: 2017
Publisher :: IEEE, 2017.
Abstract: As an open source implementation of GFS, Hadoop Distributed File System (HDFS) has high efficiency on handling the large files. However, due to its own master-slave structure and the storage of metadata, the efficiency is low when dealing with massive small files. It occupies large amount of NameNode memory, reduces access efficiency, and delays concurrent user access. In order to improve this performance efficiency, this paper studies the method of processing small files on HDFS. According to the file storage process, this paper proposes a small file processing scheme based on index mechanism. Before the file is uploaded to the HDFS cluster, the file size is measured. The small files are indexed and merged. If it is a small file, then it will be indexed and processed. And it will be created an index file to save the index information of the small file. At the same time, this scheme introduces the distributed caching strategy to further optimize the I/O operation of small files, so as to improve the reading speed. Experimental results show that compared with the original HDFS and HAR scheme, this scheme has a great improvement in obtain memory efficiency and consumption of memory resources.

Database :: OpenAIRE
Journal :: 2017 IEEE 2nd International Conference on Cloud Computing and Big Data Analysis (ICCCBDA)
Accession number :: edsair.doi...........f4424e4596a96c810d1a00f754d78f79
Full Text :: https://doi.org/10.1109/icccbda.2017.7951882

Full Text Access

Tools