Back to Search Start Over

Asynchronous Task-Based Polar Decomposition on Single Node Manycore Architectures

Asynchronous Task-Based Polar Decomposition on Single Node Manycore Architectures

Authors :
Mathieu Faverge
David E. Keyes
Dalal Sukkari
Hatem Ltaief
King Abdullah University of Science and Technology (KAUST)
Institut Polytechnique de Bordeaux (Bordeaux INP)
High-End Parallel Algorithms for Challenging Numerical Simulations (HiePACS)
Laboratoire Bordelais de Recherche en Informatique (LaBRI)
Université de Bordeaux (UB)-Centre National de la Recherche Scientifique (CNRS)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB)-Université de Bordeaux (UB)-Centre National de la Recherche Scientifique (CNRS)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB)-Inria Bordeaux - Sud-Ouest
Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)
Plafrim
Chameleon
EA MORSE
ANR-13-MONU-0007,SOLHAR,Solveurs pour architectures hétérogènes utilisant des supports d'exécution(2013)
Université de Bordeaux (UB)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB)-Centre National de la Recherche Scientifique (CNRS)-Université de Bordeaux (UB)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB)-Centre National de la Recherche Scientifique (CNRS)-Inria Bordeaux - Sud-Ouest
Source :
IEEE Transactions on Parallel and Distributed Systems, IEEE Transactions on Parallel and Distributed Systems, Institute of Electrical and Electronics Engineers, 2017, XX, ⟨10.1109/TPDS.2017.2755655⟩, IEEE Transactions on Parallel and Distributed Systems, 2017, XX, ⟨10.1109/TPDS.2017.2755655⟩
Publication Year :
2018
Publisher :
Institute of Electrical and Electronics Engineers (IEEE), 2018.

Abstract

International audience; This paper introduces the first asynchronous, task-based formulation of the polar decomposition and its corresponding implementation on manycore architectures. Based on a new formulation of the iterative QR dynamically-weighted Halley algorithm (QDWH) for the calculation of the polar decomposition, the proposed implementation replaces the original and hostile LU factorization for the condition number estimator by the more adequate QR factorization to enable software portability across various architectures. Relying on fine-grained computations, the novel task-based implementation is also capable of taking advantage of the identity structure of the matrix involved during the QDWH iterations, which decreases the overall algorithmic complexity. Furthermore, the artifactual synchronization points have been weakened compared to previous implementations, unveiling look-ahead opportunities for better hardware occupancy. The overall QDWH-based polar decomposition can then be represented as a directed acyclic graph (DAG), where nodes represent computational tasks and edges define the inter-task data dependencies. The StarPU dynamic runtime system is employed to traverse the DAG, to track the various data dependencies and to asynchronously schedule the computational tasks on the underlying hardware resources, resulting in an out-of-order task scheduling. Benchmarking experiments show significant improvements against existing state-of-the-art high performance implementations (i.e., Intel MKL and Elemental) for the polar decomposition on latest shared-memory vendors' systems (i.e., Intel Haswell/Broadwell/Knights Landing, NVIDIA K80/P100 GPUs and IBM Power8), while maintaining high numerical accuracy.

Details

ISSN :
10459219
Volume :
29
Database :
OpenAIRE
Journal :
IEEE Transactions on Parallel and Distributed Systems
Accession number :
edsair.doi.dedup.....36767dbdbfac5e1da545b79b566dc8be
Full Text :
https://doi.org/10.1109/tpds.2017.2755655