1. On the Parallelization Upper Bound for Asynchronous Stochastic Gradients Descent in Non-convex Optimization.
- Author
-
Wang, Lifu and Shen, Bo
- Subjects
- *
ASYNCHRONOUS learning , *DISTRIBUTED algorithms , *DEEP learning , *PARALLEL algorithms , *ALGORITHMS , *MATHEMATICAL optimization - Abstract
In deep learning, asynchronous parallel stochastic gradient descent (APSGD) is a broadly used algorithm to speed up the training process. In asynchronous system, the time delay of stale gradients in asynchronous algorithms is generally proportional to the total number of workers. When the number of workers is too large, the time delay will bring additional deviation from the accurate gradient due to delayed gradients and may cause a negative influence on the convergence speed of the algorithm. One may ask: How many workers can be used at most to achieve both the convergence and the speedup? In this paper, we consider the asynchronous training problem with the non-convex case. We theoretically study this problem to find an approximating second-order stationary point using asynchronous algorithms in non-convex optimization and investigate the behaviors of APSGD near-saddle points. This work gives the first theoretical guarantee to find an approximating second-order stationary point in asynchronous algorithms and a provable upper bound for the time delay. The techniques we provide can be generalized to analyze other types of asynchronous algorithms to understand the behaviors of asynchronous algorithms in distributed asynchronous parallel training. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF