Back to Search Start Over

Correctable and uncorrectable errors using large scale DRAM DIMMs in replacement network servers.

Authors :
Baeg, Sanghyeon
Qasim, Mirza
Kwon, Junhyeong
Li, Tan
Gupta, Nilay
Wen, Shi-Jie
Kolli, Satyadev
Source :
Microelectronics Reliability. Aug2019, Vol. 99, p104-112. 9p.
Publication Year :
2019

Abstract

This paper investigated DRAM DIMM errors using field records in replacement network servers. Large DRAM samples of about 40 K were collected over a 2.5 years period from 23 different server types, included various DIMMs from three different DRAM manufacturers with densities between 4 and 128 GB, and speeds between 1066 and 2400 Mbps. Errors that occurred during system operation were classified as either correctable (CE) or uncorrectable (UE) errors based on error correction code (ECC) schemes built into the servers. Of the collected DIMMS, 24% had recorded errors, where CE-only, UE-only, and UE and CE together comprised 28%, 43%, and 29% of recorded errors, respectively. Since UEs can cause large-scale failures, systems are replaced upon any UE occurrence. Approximately half UE-only DIMMs had 1 UE error. In contrast, many DIMMs had billions of CE errors, where a faulty location may be repetitively accessed. Such drastic differences in UE and CE counts help explain the importance of ECC and error mitigation schemes. Comparative analyses of errors were made over the manufacturers and operating speeds. After reasonable adjustments for repetitive counts of errors, failure in time (FIT) differences were up to 38% over manufacturers. Higher speed DIMMs generally had higher FIT with 2400 Mbps DIMMs exhibiting 6.7 times FIT of 1066 Mbps DIMMs. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
00262714
Volume :
99
Database :
Academic Search Index
Journal :
Microelectronics Reliability
Publication Type :
Academic Journal
Accession number :
139387550
Full Text :
https://doi.org/10.1016/j.microrel.2019.05.008