Novel Convergence Results of Adaptive Stochastic Gradient Descents.

Authors :: Sun, Tao
Qiao, Linbo
Liao, Qing
Li, Dongsheng
Source :: IEEE Transactions on Image Processing; 2021, Vol. 30, p1044-1056, 13p
Publication Year :: 2021
Abstract: Adaptive stochastic gradient descent, which uses unbiased samples of the gradient with stepsizes chosen from the historical information, has been widely used to train neural networks for computer vision and pattern recognition tasks. This paper revisits the theoretical aspects of two classes of adaptive stochastic gradient descent methods, which contain several existing state-of-the-art schemes. We focus on the presentation of novel findings: In the general smooth case, the nonergodic convergence results are given, that is, the expectation of the gradients’ norm rather than the minimum of past iterates is proved to converge; We also studied their performances under Polyak-Łojasiewicz property on the objective function. In this case, the nonergodic convergence rates are given for the expectation of the function values. Our findings show that more substantial restrictions on the steps are needed to guarantee the nonergodic function values’ convergence (rates). [ABSTRACT FROM AUTHOR]