Back to Search Start Over

High-performance parallel algorithms for the Tucker decomposition of higher order sparse tensors

Authors :
Kaya, Oguz
Uçar, Bora
Laboratoire de l'Informatique du Parallélisme (LIP)
Centre National de la Recherche Scientifique (CNRS)-Université de Lyon-Institut National de Recherche en Informatique et en Automatique (Inria)-Université Claude Bernard Lyon 1 (UCBL)
Université de Lyon-École normale supérieure - Lyon (ENS Lyon)
Optimisation des ressources : modèles, algorithmes et ordonnancement (ROMA)
Inria Grenoble - Rhône-Alpes
Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire de l'Informatique du Parallélisme (LIP)
Université de Lyon-École normale supérieure - Lyon (ENS Lyon)-Centre National de la Recherche Scientifique (CNRS)-Université de Lyon-Université Claude Bernard Lyon 1 (UCBL)
Inria - Research Centre Grenoble – Rhône-Alpes
École normale supérieure de Lyon (ENS de Lyon)-Université Claude Bernard Lyon 1 (UCBL)
Université de Lyon-Université de Lyon-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)
Université de Lyon-Université de Lyon-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-École normale supérieure de Lyon (ENS de Lyon)-Université Claude Bernard Lyon 1 (UCBL)
Université de Lyon-Université de Lyon-Centre National de la Recherche Scientifique (CNRS)
École normale supérieure - Lyon (ENS Lyon)-Université Claude Bernard Lyon 1 (UCBL)
Université de Lyon-Université de Lyon-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-École normale supérieure - Lyon (ENS Lyon)-Université Claude Bernard Lyon 1 (UCBL)
Source :
[Research Report] RR-8801, Inria-Research Centre Grenoble – Rhône-Alpes. 2015
Publication Year :
2015
Publisher :
HAL CCSD, 2015.

Abstract

We investigate an efficient parallelization of a class of algorithms for the well-known Tucker decomposition of general $N$-dimensional sparse tensors.The targeted algorithms are iterative and use the alternating least squares method.At each iteration, for each dimension of an $N$-dimensional input tensor, the following operations are performed: (i) the tensor is multiplied with $(N - 1)$ matrices (TTM step); (ii) the product is then converted to a matrix; and (iii) a few leading left singular vectors of the resulting matrix are computed (SVD step) to update one of the matrices for the next TTM step. We propose an efficient parallelization of these algorithms for current supercomputers comprised of compute nodes, where each node is a multi-core system.We reformulate the computation of $N$ successive TTM-steps to increase the reuse of intermediate computation, which is of interest on its own.We discuss a set of preprocessing steps which takes all computational decisions out of the main iteration of the algorithm and provide an intuitive row-wise shared-memory parallelism for the TTM and SVD steps.We consider a coarse and a fine grain computational scheme, investigate their data dependencies, and identify efficient communication schemes.We demonstrate how the computation of singular vectors in the SVD step can be carried out efficiently following the TTM step.Finally, we develop a hybrid MPI-OpenMP based implementation of the overall algorithm and report speedup results on up to 2048 cores.

Details

Language :
English
Database :
OpenAIRE
Journal :
[Research Report] RR-8801, Inria-Research Centre Grenoble – Rhône-Alpes. 2015
Accession number :
edsair.dedup.wf.001..5526e9a4fba78b67b1e3709eb928a808