Back to Search Start Over

Modeling Memory Contention between Communications and Computations in Distributed HPC Systems

Authors :
Alexandre DENIS
Emmanuel JEANNOT
Philippe SWARTVAGHER
Topology-Aware System-Scale Data Management for High-Performance Computing (TADAAM)
Laboratoire Bordelais de Recherche en Informatique (LaBRI)
Université de Bordeaux (UB)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB)-Centre National de la Recherche Scientifique (CNRS)-Université de Bordeaux (UB)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB)-Centre National de la Recherche Scientifique (CNRS)-Inria Bordeaux - Sud-Ouest
Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)
Projet Région Nouvelle-Aquitaine 2018-1R50119 'HPC scalable ecosystem'
Grid5000
GENCI
Plafrim
ANR-19-CE46-0009,SOLHARIS,Solveurs pour architectures hétérogènes utilisant des supports d'exécution, objectif scalabilité(2019)
Source :
IPDPS-2022-IEEE International Parallel and Distributed Processing Symposium Workshops, IPDPS-2022-IEEE International Parallel and Distributed Processing Symposium Workshops, May 2022, Lyon / Virtual, France. pp.10, ⟨10.1109/IPDPSW55747.2022.00086⟩
Publication Year :
2022
Publisher :
HAL CCSD, 2022.

Abstract

International audience; To amortize the cost of MPI communications, distributed parallel HPC applications can overlap network communications with computations in the hope that it improves global application performance. When using this technique, both computations and communications are running at the same time. But computation usually also performs some data movements. Since data for computations and for communications use the same memory system, memory contention may occur when computations are memory-bound and large messages are transmitted through the network at the same time. In this paper we propose a model to predict memory bandwidth for computations and for communications when they are executed side by side, according to data locality and taking contention into account. Elaboration of the model allowed to better understand locations of bottleneck in the memory system and what are the strategies of the memory system in case of contention. The model was evaluated on many platforms with different characteristics, and showed a prediction error in average lower than 4 %.

Details

Language :
English
Database :
OpenAIRE
Journal :
IPDPS-2022-IEEE International Parallel and Distributed Processing Symposium Workshops, IPDPS-2022-IEEE International Parallel and Distributed Processing Symposium Workshops, May 2022, Lyon / Virtual, France. pp.10, ⟨10.1109/IPDPSW55747.2022.00086⟩
Accession number :
edsair.doi.dedup.....015f0a25f77d6dd3cd3ac68d1f47c84c
Full Text :
https://doi.org/10.1109/IPDPSW55747.2022.00086⟩