Back to Search Start Over

Reparo: A Fast RAID Recovery Scheme for Ultra-large SSDs

Authors :
Ko Minseok
Sungjin Lee
Myoungjun Chun
Yoona Kim
Ha Keon-Soo
Jihong Kim
Du-Won Hong
Source :
ACM Transactions on Storage. 17:1-24
Publication Year :
2021
Publisher :
Association for Computing Machinery (ACM), 2021.

Abstract

A recent ultra-large SSD (e.g., a 32-TB SSD) provides many benefits in building cost-efficient enterprise storage systems. Owing to its large capacity, however, when such SSDs fail in a RAID storage system, a long rebuild overhead is inevitable for RAID reconstruction that requires a huge amount of data copies among SSDs. Motivated by modern SSD failure characteristics, we propose a new recovery scheme, called reparo , for a RAID storage system with ultra-large SSDs. Unlike existing RAID recovery schemes, reparo repairs a failed SSD at the NAND die granularity without replacing it with a new SSD, thus avoiding most of the inter-SSD data copies during a RAID recovery step. When a NAND die of an SSD fails, reparo exploits a multi-core processor of the SSD controller in identifying failed LBAs from the failed NAND die and recovering data from the failed LBAs. Furthermore, reparo ensures no negative post-recovery impact on the performance and lifetime of the repaired SSD. Experimental results using 32-TB enterprise SSDs show that reparo can recover from a NAND die failure about 57 times faster than the existing rebuild method while little degradation on the SSD performance and lifetime is observed after recovery.

Details

ISSN :
15533093 and 15533077
Volume :
17
Database :
OpenAIRE
Journal :
ACM Transactions on Storage
Accession number :
edsair.doi...........dcf6e80d2253fd9c3cafe2c307b61ae9
Full Text :
https://doi.org/10.1145/3450977