Back to Search
Start Over
Hierarchical Approach for Deriving a Reproducible LU factorization on GPUs
- Source :
- The Numerical Reproducibility at Exascale (NRE16) workshop held as part of the Supercomputing Conference (SC16), The Numerical Reproducibility at Exascale (NRE16) workshop held as part of the Supercomputing Conference (SC16), Nov 2016, Salt Lake City, UT, United States
- Publication Year :
- 2016
- Publisher :
- HAL CCSD, 2016.
-
Abstract
- International audience; We propose a reproducible variant of the unblocked LU factorization for graphics processor units (GPUs). For this purpose, we provide Level-1/2 BLAS kernels that deliver correctly-rounded and reproducible results for the dot (inner) product, vector scaling, and the matrix-vector product. In addition, we draw a strategy to enhance the accuracy of the triangular solve via inexpensive iterative refinement. Following a bottom-up approach, we finally construct a reproducible implementation of the LU factorization for GPUs, which can easily accommodate partial pivoting for stability and be eventually integrated into a (blocked) high performance and stable algorithm for the LU factorization.
- Subjects :
- accuracy
long accumulator
GPUs
BLAS
[INFO.INFO-AO]Computer Science [cs]/Computer Arithmetic
LU factorization
[INFO.INFO-DC]Computer Science [cs]/Distributed, Parallel, and Cluster Computing [cs.DC]
reproducibility
error-free transformation
[MATH.MATH-NA]Mathematics [math]/Numerical Analysis [math.NA]
Subjects
Details
- Language :
- English
- Database :
- OpenAIRE
- Journal :
- The Numerical Reproducibility at Exascale (NRE16) workshop held as part of the Supercomputing Conference (SC16), The Numerical Reproducibility at Exascale (NRE16) workshop held as part of the Supercomputing Conference (SC16), Nov 2016, Salt Lake City, UT, United States
- Accession number :
- edsair.dedup.wf.001..8773c879e7f65722922504354718e378