1. FlexGripPlus: An improved GPGPU model to support reliability analysis.
- Author
-
Condia, Josie E. Rodriguez, Du, Boyang, Sonza Reorda, Matteo, and Sterpone, Luca
- Subjects
- *
GRAPHICS processing units , *RELIABILITY in engineering , *AUTONOMOUS vehicles , *HIGH performance computing , *SOFT errors , *AUTOMATED guided vehicle systems - Abstract
General Purpose Graphics Processing Units (GPGPUs) have been extensively used in the last decade as accelerators in high demanding applications, such as multimedia processing and high-performance computing. Nowadays, these devices are becoming popular even in safety-critical applications, such as in autonomous and semi-autonomous vehicles. However, these devices can suffer from the effects of transient faults, such as those produced by radiation effects. Among those effects, Single Event Upsets (SEUs), which are the focus of this paper, can cause application misbehaviors, which may lead to catastrophic consequences. In this work, we first describe how we extended the capabilities of an open-source VHDL GPGPU model (FlexGrip) and developed a new version named FlexGripPlus to study and analyze the effects of SEUs in a GPGPU in a much more detailed manner. We also performed extensive fault injection campaigns using FlexGripPlus, which allowed identifying the most critical effects within the GPGPU architecture. We finally focused on the scheduler controller since it represents a module that is specific to the GPGPU architecture and showed that it has different levels of SEU sensibility depending on the affected location. Moreover, the results of additional analyses varying the number of parallel execution units in the system are presented, demonstrating the correlation between the number of execution units in a GPGPU and the system reliability. The paper • describes a new model of a GPGPU (named FlexGripPlus) derived from FlexGrip and compliant with the NVIDIA G80 GPUs. • highlights a multi-platform fault injection environment aimed at evaluating the sensibility to transient faults • reports extensive results showing that different modules behave rather differently and show different sensitivity to SEUs • evaluates the fault sensibility of a GPGPU, analyzing the impact of different workloads, configurations and coding styles. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF