Start Over

Variable-size batched Gauss-Jordan elimination for block-Jacobi preconditioning on graphics processors

Authors :: Universitat Politècnica de València. Departamento de Informática de Sistemas y Computadores - Departament d'Informàtica de Sistemes i Computadors
European Commission
U.S. Department of Energy
European Regional Development Fund
Swiss National Supercomputing Centre
Ministerio de Economía y Competitividad
Helmholtz Association of German Research Centers
Anzt, Hartwig
Dongarra, Jack
Flegar, Goran
Quintana Ortí, Enrique S.
Universitat Politècnica de València. Departamento de Informática de Sistemas y Computadores - Departament d'Informàtica de Sistemes i Computadors
European Commission
U.S. Department of Energy
European Regional Development Fund
Swiss National Supercomputing Centre
Ministerio de Economía y Competitividad
Helmholtz Association of German Research Centers
Anzt, Hartwig
Dongarra, Jack
Flegar, Goran
Quintana Ortí, Enrique S.
Publication Year :: 2019
Abstract: [EN] In this work, we address the efficient realization of block-Jacobi preconditioning on graphics processing units (GPUs). This task requires the solution of a collection of small and independent linear systems. To fully realize this implementation, we develop a variablesize batched matrix inversion kernel that uses Gauss-Jordan elimination (GJE) along with a variable-size batched matrix-vector multiplication kernel that transforms the linear systems' right-hand sides into the solution vectors. Our kernels make heavy use of the increased register count and the warp-local communication associated with newer GPU architectures. Moreover, in the matrix inversion, we employ an implicit pivoting strategy that migrates the workload (i.e., operations) to the place where the data resides instead of moving the data to the executing cores. We complement the matrix inversion with extraction and insertion strategies that allow the block-Jacobi preconditioner to be set up rapidly. The experiments on NVlDlA's K40 and P100 architectures reveal that our variable-size batched matrix inversion routine outperforms the CUDA basic linear algebra subroutine (cuBLAS) library functions that provide the same (or even less) functionality. We also show that the preconditioner setup and preconditioner application cost can be somewhat offset by the faster convergence of the iterative solver. (C) 2018 Elsevier B.V. All rights reserved.

Details

Database :: OAIster
Notes :: TEXT, English
Publication Type :: Electronic Resource
Accession number :: edsoai.on1228695702
Document Type :: Electronic Resource

Tools

Email
Cite

Printer

Authors Abstract Subjects Details

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Variable-size batched Gauss-Jordan elimination for block-Jacobi preconditioning on graphics processors

Abstract

Details

Tools

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Variable-size batched Gauss-Jordan elimination for block-Jacobi preconditioning on graphics processors

Abstract

Details

Tools

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources