Back to Search Start Over

Rendezvous algorithms for large-scale modeling and simulation.

Authors :
Plimpton, Steven J.
Knight, Christopher
Source :
Journal of Parallel & Distributed Computing. Jan2021, Vol. 147, p184-195. 12p.
Publication Year :
2021

Abstract

Rendezvous algorithms encode a communication pattern that is useful when processors sending data do not know who the receiving processors should be, or vice versa. The idea is to define an intermediate decomposition where datums from different sending processors can "rendezvous" to perform a computation, in a manner that both the senders and eventual receivers of the results can identify the appropriate rendezvous processor. Originally designed for interpolating between overlaid grids with independent parallel decompositions (Plimpton et al., 2004), we have recently found rendezvous algorithms useful for a variety of operations in particle- or grid-based simulation codes when running large problems on large numbers of processors. In particular, we show they can perform well when a load-balanced intermediate decomposition is randomized and not spatial, requiring all-to-all communication to move data between processors. In this case rendezvous algorithms leverage the large bisection communication bandwidths which parallel machines provide. We describe how rendezvous algorithms work in a scientific computing context and give specific examples for molecular dynamics and Direct Simulation Monte Carlo codes which result in dramatic performance improvements versus simpler algorithms which do not scale as well. We explain how a generic rendezvous algorithm can be implemented, and also point out similarities with the MapReduce paradigm popularized by Google and Hadoop. • This paper explains how parallel rendezvous algorithms work in the context of large-scale scientific modeling and simulation codes. • It gives several examples of such algorithms. • It presents performance gains of the new algorithms relative to previously used simpler algorithms which were less scalable. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
07437315
Volume :
147
Database :
Academic Search Index
Journal :
Journal of Parallel & Distributed Computing
Publication Type :
Academic Journal
Accession number :
146786906
Full Text :
https://doi.org/10.1016/j.jpdc.2020.09.001