Considerations on the Implementation and Use of Anderson Acceleration on Distributed Memory and GPU-based Parallel Computers

Authors :: Carol S. Woodward
John Loffeld
Source :: Association for Women in Mathematics Series ISBN: 9783319341378
Publication Year :: 2016
Publisher :: Springer International Publishing, 2016.
Abstract: Recent work suggests that Anderson acceleration can be used as an accelerator to the fixed-point iterative method. To improve the viability of the algorithm, we seek to improve its computational efficiency on parallel machines. The primary kernel of the method is a least-squares minimization within the main loop. We consider two approaches to reduce its cost. The first is to use a communication-avoiding QR factorization, and the second is to employ a GMRES-like restarting procedure. On problems using 1,000 processors or less, we find the amount of communication too low to justify communication avoidance. The restarting procedure also proves not to be better than current approaches unless the cost of the function evaluation is very small. In order to begin taking advantage of current trends in machine architecture, we also studied a first-attempt single-node GPU implementation of Anderson acceleration. Performance results show that for sufficiently large problems a GPU implementation can provide a significant performance increase over CPU versions due to the GPU’s higher memory bandwidth.

Subjects :: Computer science
Iterative method
Information and Computer Science
Event loop
Memory bandwidth
010103 numerical & computational mathematics
Parallel computing
01 natural sciences
Computational science
QR decomposition
Kernel (image processing)
Distributed memory
Minification
0101 mathematics

Full Text Access

Tools