Back to Search Start Over

Large Scale Parallelization Using File-Based Communications

Authors :
Byun, Chansup
Kepner, Jeremy
Arcand, William
Bestor, David
Bergeron, Bill
Gadepally, Vijay
Houle, Michael
Hubbell, Matthew
Jones, Michael
Klein, Anna
Michaleas, Peter
Mullen, Julie
Prout, Andrew
Rosa, Antonio
Samsi, Siddharth
Yee, Charles
Reuther, Albert
Publication Year :
2019

Abstract

In this paper, we present a novel and new file-based communication architecture using the local filesystem for large scale parallelization. This new approach eliminates the issues with filesystem overload and resource contention when using the central filesystem for large parallel jobs. The new approach incurs additional overhead due to inter-node message file transfers when both the sending and receiving processes are not on the same node. However, even with this additional overhead cost, its benefits are far greater for the overall cluster operation in addition to the performance enhancement in message communications for large scale parallel jobs. For example, when running a 2048-process parallel job, it achieved about 34 times better performance with MPI_Bcast() when using the local filesystem. Furthermore, since the security for transferring message files is handled entirely by using the secure copy protocol (scp) and the file system permissions, no additional security measures or ports are required other than those that are typically required on an HPC system.

Details

Database :
arXiv
Publication Type :
Report
Accession number :
edsarx.1909.01241
Document Type :
Working Paper
Full Text :
https://doi.org/10.1109/HPEC.2019.8916221