Start Over

Conjugate Gradient Solvers with High Accuracy and Bit-wise Reproducibility between CPU and GPU using Ozaki Scheme

Authors :: Takeshi Ogita
Daichi Mukunoki
Katsuhisa Ozaki
Roman Iakymchuk
RIKEN Center for Computational Science [Kobe] (RIKEN CCS)
RIKEN - Institute of Physical and Chemical Research [Japon] (RIKEN)
Shibaura Institute of Technology
Tokyo Woman's Christian University, Department of Mathematics
Tokyo Woman's Christian University (TWCU)
Performance et Qualité des Algorithmes Numériques (PEQUAN)
LIP6
Sorbonne Université (SU)-Centre National de la Recherche Scientifique (CNRS)-Sorbonne Université (SU)-Centre National de la Recherche Scientifique (CNRS)
Iakymchuk, Roman
Source :: HPC Asia, The International Conference on High Performance Computing in Asia-Pacific Region
Publication Year :: 2020
Publisher :: HAL CCSD, 2020.
Abstract: On Krylov subspace methods such as the Conjugate Gradient (CG), the number of iterations until convergence may increase due to the loss of computation accuracy caused by rounding errors in floating-point computations. Besides, as the order of operations is non-deterministic on parallel computations, the result and the behavior of the convergence may be non-identical in different environments, even for the same input. This paper presents a new approach for the CG method with high accuracy as well as bit-level reproducibility of computed solutions on many-core processors, including both x86 CPUs and NVIDIA GPUs. In our proposed approach, accurate and reproducible operations are installed into all the inner-product based operations such as matrix-vector multiplication and dot-product, which are the main sources that may disturb reproducibility in the CG method. The accurate and reproducible operations are performed using the Ozaki scheme, which is the error-free transformation for dot-product that can ensure the correct-rounding. As this method can be built upon vendor-provided linear algebra libraries such as Intel Math Kernel Library and NVIDIA cuBLAS/ cuSparse, it reduces the development cost. In this paper, showing some examples with the non-identical conver-gences and computed solutions on different platforms, we demonstrate the applicability and the effectiveness of the proposed approach as well as its performance on both CPUs and GPUs. Besides, we compare against an existing accurate and reproducible CG implementation based on the Exact BLAS (ExBLAS) on CPUs.

Subjects :: Computer science
Computation
GPU
010103 numerical & computational mathematics
02 engineering and technology
Parallel computing
01 natural sciences
Basic Linear Algebra Subprograms
Conjugate gradient method
Convergence (routing)
[INFO.INFO-DC] Computer Science [cs]/Distributed, Parallel, and Cluster Computing [cs.DC]
0202 electrical engineering, electronic engineering, information engineering
Conjugate Gradient
0101 mathematics
Bitwise operation
Accuracy
[INFO.INFO-MS]Computer Science [cs]/Mathematical Software [cs.MS]
020203 distributed computing
Rounding
[INFO.INFO-AO]Computer Science [cs]/Computer Arithmetic
Krylov subspace
[MATH.MATH-NA] Mathematics [math]/Numerical Analysis [math.NA]
Reproducibility
Nondeterministic algorithm
Heterogenous computing
[INFO.INFO-MS] Computer Science [cs]/Mathematical Software [cs.MS]
[INFO.INFO-AO] Computer Science [cs]/Computer Arithmetic
CPU
[INFO.INFO-DC]Computer Science [cs]/Distributed, Parallel, and Cluster Computing [cs.DC]
[MATH.MATH-NA]Mathematics [math]/Numerical Analysis [math.NA]

Details

Language :: English
ISBN :: 978-1-4503-8842-9
ISBNs :: 9781450388429
Database :: OpenAIRE
Journal :: HPC Asia, The International Conference on High Performance Computing in Asia-Pacific Region
Accession number :: edsair.doi.dedup.....a77fc4ca717a53aab7c1f056c8b103ea

Tools

Email
Cite

Printer

Authors Abstract Subjects Details

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Conjugate Gradient Solvers with High Accuracy and Bit-wise Reproducibility between CPU and GPU using Ozaki Scheme

Abstract

Subjects

Details

Tools

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Conjugate Gradient Solvers with High Accuracy and Bit-wise Reproducibility between CPU and GPU using Ozaki Scheme

Abstract

Subjects

Details

Tools

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources