1. IRHT: An SDC detection and recovery architecture based on value locality of instruction binary codes.
- Author
-
Tajary, Alireza, Zarandi, Hamid R., and Bagherzadeh, Nader
- Subjects
- *
BINARY codes , *REED-Solomon codes , *DATA corruption , *PLURALITY voting , *SYSTEM failures - Abstract
Silent Data Corruptions (SDCs), those errors that escape detection methods, are critical for system designers because they may result in systems failures. In order to catch SDCs, mechanisms should focus on the behavioural aspects of errors in addition to their physical location or error patterns. Therefore, protection codes like parity, hamming, and the Reed-Solomon code, which heavily depend on the physical location of data bits, are not enough in processors for detection of computing errors. Using characterizing data behaviour during program executions, we have observed value locality in results of destination register for each instruction binary code (instruction opcode and operand codes). This locality exists not only in the results of each instruction, but also in the results of instructions at different memory locations having the same binary code. As a result, an architecture called Instruction Result History Table (IRHT) is presented, which is indexed by the instruction binary code. In the IRHT, a history of values produced by the same instruction binary codes are stored in and utilized during each instruction execution cycle. Any mismatch between the stored values in the IRHT and those generated by current execution, indicates an SDC syndrome. To confirm having SDCs with a high level of confidence, a second execution of the current instruction is issued. A duplication of the execution confirms whether SDC occurred. In the case of SDCs, a third instruction execution with the help of a majority voting frees the system of SDC. Several extensive simulations showed that, up to 83.54% of SDCs are detectable with the help of this locality. Moreover, with the small hardware, IRHT, i.e., 16 kB size, 80.66% of SDCs can be detected on average. Note that the presented method can detect those errors that escape conventional detection mechanisms. So, it can be utilized in conjunction with other conventional methods. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF