1. PHY: A performance-driven hybrid communication compression method for distributed training.
- Author
-
Chen, Chen-Chun, Chou, Yu-Min, and Chou, Jerry
- Subjects
- *
ARTIFICIAL neural networks , *OPTIMIZATION algorithms , *DISTRIBUTED computing , *DEEP learning , *IMAGE compression - Abstract
Distributed training is needed to shorten the training time of deep neural networks. However, the communication overhead often hurts performance efficiency, especially in a distributed computing environment with limited network bandwidth. Hence, gradient compression techniques have been proposed to reduce communication time. But, compression also has the risk of causing lower model accuracy and longer training time due to compression loss and compression time. As a result, compression may not consistently achieve desired results, and there are limited discussions on when and which compression should be used. To address this problem, we propose a performance-driven hybrid compression solution. We make three main contributions. (1) We describe a hybrid compression strategy that chooses the compression method for individual model gradients. (2) We build an offline performance estimator and an online loss monitor to ensure the compression decision can minimize training time without sacrificing mode accuracy. (3) Our implementation can be imported to existing deep learning frameworks and applicable to a wide range of compression methods. Up to 3.6x training performance speedup was observed compared to other state-of-the-art methods. • Our goal is to constrict a better compression strategy for training a DNN model in distributed computing environment. • Our proposed compression strategy is fine-grained, hybrid and performance-driven. • Our method is built based on an offline performance estimator, an online loss monitor, and a linear search time optimization algorithm. • Our approach is implemented as a lightweight library that can be imported to existing deep learning frameworks. • Our evaluations show that we achieved up to 3.6x training performance speedup compared to other compression methods. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF