Back to Search Start Over

SGD with a Constant Large Learning Rate Can Converge to Local Maxima

Authors :
Ziyin, Liu
Li, Botao
Simon, James B.
Ueda, Masahito
Publication Year :
2021
Publisher :
arXiv, 2021.

Abstract

Previous works on stochastic gradient descent (SGD) often focus on its success. In this work, we construct worst-case optimization problems illustrating that, when not in the regimes that the previous works often assume, SGD can exhibit many strange and potentially undesirable behaviors. Specifically, we construct landscapes and data distributions such that (1) SGD converges to local maxima, (2) SGD escapes saddle points arbitrarily slowly, (3) SGD prefers sharp minima over flat ones, and (4) AMSGrad converges to local maxima. We also realize results in a minimal neural network-like example. Our results highlight the importance of simultaneously analyzing the minibatch sampling, discrete-time updates rules, and realistic landscapes to understand the role of SGD in deep learning.<br />Comment: Fixed typos

Details

Database :
OpenAIRE
Accession number :
edsair.doi.dedup.....98daaa885b63863efd1b1a8b306ce48a
Full Text :
https://doi.org/10.48550/arxiv.2107.11774