1. Resource aggregation for task-based Cholesky Factorization on top of modern architectures.
- Author
-
Cojean, T., Guermouche, A., Hugo, A., Namyst, R., and Wacrenier, P.A.
- Subjects
- *
MODERN architecture , *COMPUTING platforms , *PARALLEL programming , *LINEAR algebra , *HETEROGENEOUS computing - Abstract
Hybrid computing platforms are now commonplace, featuring a large number of CPU cores and accelerators. This trend makes balancing computations between these heterogeneous resources performance critical. In this paper we propose aggregating several CPU cores in order to execute larger parallel tasks and improve load balancing between CPUs and accelerators. Additionally, we present our approach to exploit internal parallelism within tasks, by combining two runtime system schedulers: a global runtime system to schedule the main task graph and a local one to cope with internal task parallelism. We demonstrate the relevance of our approach in the context of the dense Cholesky factorization kernel implemented on top of the StarPU task-based runtime system. We present experimental results showing that our solution outperforms state-of-the-art implementations on two architectures: a heterogeneous CPU+GPU machine and the Intel Xeon Phi Knights Landing processor. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF