51. Convergence of gradient descent for learning linear neural networks.
- Author
-
Nguegnang, Gabin Maxime, Rauhut, Holger, and Terstiege, Ulrich
- Subjects
- *
MATRIX decomposition - Abstract
We study the convergence properties of gradient descent for training deep linear neural networks, i.e., deep matrix factorizations, by extending a previous analysis for the related gradient flow. We show that under suitable conditions on the stepsizes gradient descent converges to a critical point of the loss function, i.e., the square loss in this article. Furthermore, we demonstrate that for almost all initializations gradient descent converges to a global minimum in the case of two layers. In the case of three or more layers, we show that gradient descent converges to a global minimum on the manifold matrices of some fixed rank, where the rank cannot be determined a priori. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF