Back to Search Start Over

An effective 3-D fast fourier transform framework for multi-GPU accelerated distributed-memory systems.

Authors :
Zhou, Binbin
Lu, Lu
Source :
Journal of Supercomputing. Oct2022, Vol. 78 Issue 15, p17055-17073. 19p.
Publication Year :
2022

Abstract

This paper introduces an efficient and flexible 3D FFT framework for state-of-the-art multi-GPU distributed-memory systems. In contrast to the traditional pure MPI implementation, the multi-GPU distributed-memory systems can be exploited by employing a hybrid multi-GPU programming model that combines MPI with OpenMP to achieve effective communication. An asynchronous strategy that creates multiple streams and threads to reduce blocking time is adopted to accelerate intra-node communication. Furthermore, we combine our scheme with the GPU-Aware MPI implementation to perform GPU-GPU data transfers without CPU involvement. We also optimize the local FFT and transpose by creating fast parallel kernels to accelerate the total transform. Results show that our framework outperforms the state-of-the-art distributed 3D FFT library, being up to achieve 2× faster in a single node and 1.65× faster using two nodes. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
09208542
Volume :
78
Issue :
15
Database :
Academic Search Index
Journal :
Journal of Supercomputing
Publication Type :
Academic Journal
Accession number :
159501057
Full Text :
https://doi.org/10.1007/s11227-022-04491-7