Back to Search
Start Over
The Power of Preconditioning in Overparameterized Low-Rank Matrix Sensing
- Publication Year :
- 2023
- Publisher :
- arXiv, 2023.
-
Abstract
- We propose $\textsf{ScaledGD($\lambda$)}$, a preconditioned gradient descent method to tackle the low-rank matrix sensing problem when the true rank is unknown, and when the matrix is possibly ill-conditioned. Using overparametrized factor representations, $\textsf{ScaledGD($\lambda$)}$ starts from a small random initialization, and proceeds by gradient descent with a specific form of damped preconditioning to combat bad curvatures induced by overparameterization and ill-conditioning. At the expense of light computational overhead incurred by preconditioners, $\textsf{ScaledGD($\lambda$)}$ is remarkably robust to ill-conditioning compared to vanilla gradient descent ($\textsf{GD}$) even with overprameterization. Specifically, we show that, under the Gaussian design, $\textsf{ScaledGD($\lambda$)}$ converges to the true low-rank matrix at a constant linear rate after a small number of iterations that scales only logarithmically with respect to the condition number and the problem dimension. This significantly improves over the convergence rate of vanilla $\textsf{GD}$ which suffers from a polynomial dependency on the condition number. Our work provides evidence on the power of preconditioning in accelerating the convergence without hurting generalization in overparameterized learning.
- Subjects :
- Signal Processing (eess.SP)
FOS: Computer and information sciences
Computer Science - Machine Learning
Statistics - Machine Learning
Optimization and Control (math.OC)
FOS: Electrical engineering, electronic engineering, information engineering
FOS: Mathematics
Machine Learning (stat.ML)
Electrical Engineering and Systems Science - Signal Processing
Mathematics - Optimization and Control
Machine Learning (cs.LG)
Subjects
Details
- Database :
- OpenAIRE
- Accession number :
- edsair.doi.dedup.....798ad7b38e1fd68afb05a853e7e96d7a
- Full Text :
- https://doi.org/10.48550/arxiv.2302.01186