Back to Search Start Over

Error Log Analysis and System Optimization for Lustre Cluster Storage

Authors :
CHENG Wen, LI Yan, ZENG Ling-fang, WANG Fang, TANG Shi-cheng, YANG Li-ping, FENG Dan, ZENG Wen-jun
Source :
Jisuanji kexue, Vol 49, Iss 10, Pp 1-9 (2022)
Publication Year :
2022
Publisher :
Editorial office of Computer Science, 2022.

Abstract

Cluster storage system error messages can help to optimize the availability and reliability of storage system.Previous research of storage system error analysis focuses on the local file system or a part of the cluster storage system.There is a lack of research on storage system error messages for a long-time and multi-dimension in practical applications.With the continuous integration of new functional modules,the cluster storage system is becoming more and more complex,and the errors caused by cluster storage system emerge endlessly,which brings troubles and challenges to the researcher and developer.To address the pro-blems,we conduct a comprehensive study of the Lustre system error log.By collecting the error log in 1 673 consecutive days,we study nearly 2.26 GB of Lustre error logs,analyze the characteristics and problems of the Lustre system errors in multiple Lustre versions.We show that correlated errors between different subsystems and study the possible impacting factors on different Lustre versions.We also summarize the common errors in the Lustre system and show the corresponding solutions.We derive nume-rous new insights into the Lustre system development process and report 14 findings.Finally,we collect new error logs for 333 consecutive days to verify the 14 findings and give some cases about error optimization.Experimental results show that the error optimization cases can significantly reduce the number of errors and improve the availability and stability of the system.Our results and suggestions should be useful for both the development of the cluster storage system themselves as well as the Lustre operation and maintenance.

Details

Language :
Chinese
ISSN :
1002137X
Volume :
49
Issue :
10
Database :
Directory of Open Access Journals
Journal :
Jisuanji kexue
Publication Type :
Academic Journal
Accession number :
edsdoj.b3131028c68641be9b5306e0b437b9a5
Document Type :
article
Full Text :
https://doi.org/10.11896/jsjkx.220100134