Back to Search Start Over

Scalable Deadlock-Free Deterministic Minimal-Path Routing Engine for InfiniBand-Based Dragonfly Networks.

Authors :
Maglione-Mathey, German
Yebenes, Pedro
Escudero-Sahuquillo, Jesus
Garcia, Pedro Javier
Quiles, Francisco J.
Zahavi, Eitan
Source :
IEEE Transactions on Parallel & Distributed Systems. Jan2018, Vol. 29 Issue 1, p183-197. 15p.
Publication Year :
2018

Abstract

Dragonfly topologies are gathering great interest nowadays as one of the most promising interconnect options for High-Performance Computing (HPC) systems. However, Dragonflies contain physical cycles that may lead to traffic deadlocks unless the routing algorithm prevents them properly. In general, existing deadlock-free routing algorithms, either deterministic or adaptive, proposed for Dragonflies, use Virtual Channels (VCs) to prevent cyclic dependencies. However, these topology-aware algorithms are difficult to implement, or even unfeasible, in systems based on the InfiniBand (IB) architecture, which is nowadays the most widely used network technology in HPC systems. This is due to some limitations in the IB specification, specifically regarding the way Virtual Lanes (VLs), which are considered as similar to VCs, can be assigned to traffic flows. Indeed, none of the routing engines currently available in the official releases of the IB control software has been specifically proposed for Dragonflies. In this paper, we present a new deterministic, minimal-path routing for Dragonfly that prevents deadlocks using VLs according to the IB specification, so that it can be straightforwardly implemented in IB-based networks. We have called this proposal D3R (Deterministic Deadlock-free Dragonfly Routing). Specifically, D3R maps each route to a single, specific VL depending on the destination group, and according to a specific order, so that cyclic dependencies (so deadlocks) are prevented. D3R is scalable as it requires only 2 VLs to prevent deadlocks regardless of network size, i.e., fewer VLs than the required by the deadlock-free routing engines available in IB that are suitable for Dragonflies. Alternatively, D3R achieves higher throughput if an additional VL is used to reduce internal contention in the Dragonfly groups. We have implemented D3R as a new routing engine in OpenSM, the control software including the subnet manager in IB. We have evaluated D3R by means of simulation and by experiments performed in a real IB-based cluster, the results showing that, in general, D3R outperforms other routing engines. [ABSTRACT FROM PUBLISHER]

Details

Language :
English
ISSN :
10459219
Volume :
29
Issue :
1
Database :
Academic Search Index
Journal :
IEEE Transactions on Parallel & Distributed Systems
Publication Type :
Academic Journal
Accession number :
126683504
Full Text :
https://doi.org/10.1109/TPDS.2017.2742503