Start Over

Hierarchical Approach for Deriving a Reproducible LU factorization on GPUs

Authors :: Iakymchuk, Roman
Graillat, Stef
Defour, David
Quintana-Ortí, Enrique
Royal Institute of Technology [Stockholm] (KTH )
Performance et Qualité des Algorithmes Numériques (PEQUAN)
Laboratoire d'Informatique de Paris 6 (LIP6)
Université Pierre et Marie Curie - Paris 6 (UPMC)-Centre National de la Recherche Scientifique (CNRS)-Université Pierre et Marie Curie - Paris 6 (UPMC)-Centre National de la Recherche Scientifique (CNRS)
Digits, Architectures et Logiciels Informatiques (DALI)
Laboratoire d'Informatique de Robotique et de Microélectronique de Montpellier (LIRMM)
Centre National de la Recherche Scientifique (CNRS)-Université de Montpellier (UM)-Centre National de la Recherche Scientifique (CNRS)-Université de Montpellier (UM)-Université de Perpignan Via Domitia (UPVD)
Universitat Jaume I
Université de Montpellier (UM)-Centre National de la Recherche Scientifique (CNRS)-Université de Montpellier (UM)-Centre National de la Recherche Scientifique (CNRS)-Université de Perpignan Via Domitia (UPVD)
Source :: The Numerical Reproducibility at Exascale (NRE16) workshop held as part of the Supercomputing Conference (SC16), The Numerical Reproducibility at Exascale (NRE16) workshop held as part of the Supercomputing Conference (SC16), Nov 2016, Salt Lake City, UT, United States
Publication Year :: 2016
Publisher :: HAL CCSD, 2016.
Abstract: International audience; We propose a reproducible variant of the unblocked LU factorization for graphics processor units (GPUs). For this purpose, we provide Level-1/2 BLAS kernels that deliver correctly-rounded and reproducible results for the dot (inner) product, vector scaling, and the matrix-vector product. In addition, we draw a strategy to enhance the accuracy of the triangular solve via inexpensive iterative refinement. Following a bottom-up approach, we finally construct a reproducible implementation of the LU factorization for GPUs, which can easily accommodate partial pivoting for stability and be eventually integrated into a (blocked) high performance and stable algorithm for the LU factorization.

Subjects :: accuracy
long accumulator
GPUs
BLAS
[INFO.INFO-AO]Computer Science [cs]/Computer Arithmetic
LU factorization
[INFO.INFO-DC]Computer Science [cs]/Distributed, Parallel, and Cluster Computing [cs.DC]
reproducibility
error-free transformation
[MATH.MATH-NA]Mathematics [math]/Numerical Analysis [math.NA]

Details

Language :: English
Database :: OpenAIRE
Journal :: The Numerical Reproducibility at Exascale (NRE16) workshop held as part of the Supercomputing Conference (SC16), The Numerical Reproducibility at Exascale (NRE16) workshop held as part of the Supercomputing Conference (SC16), Nov 2016, Salt Lake City, UT, United States
Accession number :: edsair.dedup.wf.001..8773c879e7f65722922504354718e378

Tools

Email
Cite

Printer

Authors Abstract Subjects Details

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Hierarchical Approach for Deriving a Reproducible LU factorization on GPUs

Abstract

Subjects

Details

Tools

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Hierarchical Approach for Deriving a Reproducible LU factorization on GPUs

Abstract

Subjects

Details

Tools

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources