Back to Search Start Over

Fine-grained bit-flip protection for relaxation methods

Authors :
Universitat Politècnica de València. Departamento de Informática de Sistemas y Computadores - Departament d'Informàtica de Sistemes i Computadors
U.S. Department of Energy
European Regional Development Fund
Ministerio de Economía y Competitividad
Anzt, Hartwig
Dongarra, Jack
Quintana Ortí, Enrique Salvador
Universitat Politècnica de València. Departamento de Informática de Sistemas y Computadores - Departament d'Informàtica de Sistemes i Computadors
U.S. Department of Energy
European Regional Development Fund
Ministerio de Economía y Competitividad
Anzt, Hartwig
Dongarra, Jack
Quintana Ortí, Enrique Salvador
Publication Year :
2019

Abstract

[EN] Resilience is considered a challenging under-addressed issue that the high performance computing community (HPC) will have to face in order to produce reliable Exascale systems by the beginning of the next decade. As part of a push toward a resilient HPC ecosystem, in this paper we propose an error-resilient iterative solver for sparse linear systems based on stationary component-wise relaxation methods. Starting from a plain implementation of the Jacobi iteration, our approach introduces a low-cost component-wise technique that detects bit-flips, rejecting some component updates, and turning the initial synchronized solver into an asynchronous iteration. Our experimental study with sparse incomplete factorizations from a collection of real-world applications, and a practical GPU implementation, exposes the convergence delay incurred by the fault-tolerant implementation and its practical performance.

Details

Database :
OAIster
Notes :
TEXT, English
Publication Type :
Electronic Resource
Accession number :
edsoai.on1258889757
Document Type :
Electronic Resource