Back to Search
Start Over
Adaptive Regularized Warped Gradient Descent Enhances Model Generalization and Meta-learning for Few-shot Learning.
- Source :
-
Neurocomputing . Jun2023, Vol. 537, p271-281. 11p. - Publication Year :
- 2023
-
Abstract
- Warped Gradient descent (WarpGrad) is a remarkable meta-learning method for gradient transformation by inserting warp-layers. However, the task-shared initialization provided by WarpGrad is difficult to be adaptive to each task. Moreover, transforming gradients with meta-learned warp-layers ignores the local geometric features or task-specific knowledge, and may lead to a significant risk of overfitting caused by the increase of parameters. In this paper, we propose ARWarpGrad to guarantee better generalization performance with faster convergence speed by modeling both the cross-task and task-specific knowledge. We introduce Initialization Modulation (IM) to meta-learn to initialize the task-learner specifically. Furthermore, the Mixed Gradient Preprocessing (MGP), which includes the Adaptive Learning Rates (ALR) and the Gaussian Momentum Dropout (GMD), is put forward to provide better adaptive optimization direction and length for task adaptation based on the feature of local geometries. In addition, Memory Regularization (MR) is provided to alleviate the overfitting problem effectively with the use of parameter memory. Ultimately, extensive experiments on three settings demonstrate that ARWarpGrad achieves state-of-the-art performance with convergence acceleration and overfitting prevention characteristics. [ABSTRACT FROM AUTHOR]
- Subjects :
- *MACHINE learning
Subjects
Details
- Language :
- English
- ISSN :
- 09252312
- Volume :
- 537
- Database :
- Academic Search Index
- Journal :
- Neurocomputing
- Publication Type :
- Academic Journal
- Accession number :
- 163185738
- Full Text :
- https://doi.org/10.1016/j.neucom.2023.03.042