Back to Search Start Over

Hierarchical Approach for Deriving a Reproducible LU factorization on GPUs

Authors :
Iakymchuk, Roman
Graillat, Stef
Defour, David
Quintana-Ortí, Enrique
Royal Institute of Technology [Stockholm] (KTH )
Performance et Qualité des Algorithmes Numériques (PEQUAN)
Laboratoire d'Informatique de Paris 6 (LIP6)
Université Pierre et Marie Curie - Paris 6 (UPMC)-Centre National de la Recherche Scientifique (CNRS)-Université Pierre et Marie Curie - Paris 6 (UPMC)-Centre National de la Recherche Scientifique (CNRS)
Digits, Architectures et Logiciels Informatiques (DALI)
Laboratoire d'Informatique de Robotique et de Microélectronique de Montpellier (LIRMM)
Centre National de la Recherche Scientifique (CNRS)-Université de Montpellier (UM)-Centre National de la Recherche Scientifique (CNRS)-Université de Montpellier (UM)-Université de Perpignan Via Domitia (UPVD)
Universitat Jaume I
Université de Montpellier (UM)-Centre National de la Recherche Scientifique (CNRS)-Université de Montpellier (UM)-Centre National de la Recherche Scientifique (CNRS)-Université de Perpignan Via Domitia (UPVD)
Source :
The Numerical Reproducibility at Exascale (NRE16) workshop held as part of the Supercomputing Conference (SC16), The Numerical Reproducibility at Exascale (NRE16) workshop held as part of the Supercomputing Conference (SC16), Nov 2016, Salt Lake City, UT, United States
Publication Year :
2016
Publisher :
HAL CCSD, 2016.

Abstract

International audience; We propose a reproducible variant of the unblocked LU factorization for graphics processor units (GPUs). For this purpose, we provide Level-1/2 BLAS kernels that deliver correctly-rounded and reproducible results for the dot (inner) product, vector scaling, and the matrix-vector product. In addition, we draw a strategy to enhance the accuracy of the triangular solve via inexpensive iterative refinement. Following a bottom-up approach, we finally construct a reproducible implementation of the LU factorization for GPUs, which can easily accommodate partial pivoting for stability and be eventually integrated into a (blocked) high performance and stable algorithm for the LU factorization.

Details

Language :
English
Database :
OpenAIRE
Journal :
The Numerical Reproducibility at Exascale (NRE16) workshop held as part of the Supercomputing Conference (SC16), The Numerical Reproducibility at Exascale (NRE16) workshop held as part of the Supercomputing Conference (SC16), Nov 2016, Salt Lake City, UT, United States
Accession number :
edsair.dedup.wf.001..8773c879e7f65722922504354718e378