Start Over

Scalable Fully Pipelined Hardware Architecture for In-Network Aggregated AllReduce Communication.

Authors :: Liu, Yao
Zhang, Junyi
Liu, Shuo
Wang, Qiaoling
Dai, Wangchen
Cheung, Ray Chak Chung
Source :: IEEE Transactions on Circuits & Systems. Part I: Regular Papers. Oct2021, Vol. 68 Issue 10, p4194-4206. 13p.
Publication Year :: 2021
Abstract: The Ring-AllReduce framework is currently the most popular solution to deploy industry-level distributed machine learning tasks. However, only about half of the maximum bandwidth can be achieved in the optimal condition. In recent years, several in-network aggregation frameworks have been proposed to overcome the drawback, but limited hardware information have been disclosed. In this paper, we propose a scalable fully-pipelined architecture that handles tasks like forwarding, aggregation and retransmission with no bandwidth loss. The architecture is implemented on a Xilinx Ultrascale FPGA that connects to 8 working servers with 10 Gb/s network adapters, and it is able to scale to more complicated scenarios involving more workers. Compared with Ring-AllReduce, using AllReduce-Switch improves the efficient bandwidth of AllReduce communication with a ratio of $1.75\times $. In image training tasks, the proposed hardware architecture helps to achieve up to $1.67\times $ speedup to the training process. For computing-intensive models, the speedup from communication may be partially hidden by computing. In particular, for ResNet-50, AllReduce-Switch improves the training process with MPI and NCCL by $1.30\times $ and $1.04\times $ respectively. [ABSTRACT FROM AUTHOR]

Subjects :: *MACHINE learning
*PEER-to-peer architecture (Computer networks)
*HARDWARE
*BANDWIDTHS
*TASK analysis

Details

Language :: English
ISSN :: 15498328
Volume :: 68
Issue :: 10
Database :: Academic Search Index
Journal :: IEEE Transactions on Circuits & Systems. Part I: Regular Papers
Publication Type :: Periodical
Accession number :: 153763139
Full Text :: https://doi.org/10.1109/TCSI.2021.3098841

Full Text Access

View/download PDF

Tools

Email
Cite

Printer

Authors Abstract Subjects Details

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Scalable Fully Pipelined Hardware Architecture for In-Network Aggregated AllReduce Communication.

Abstract

Subjects

Details

Tools

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Scalable Fully Pipelined Hardware Architecture for In-Network Aggregated AllReduce Communication.

Abstract

Subjects

Details

Tools

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources