Back to Search Start Over

Adaptive Regularized Warped Gradient Descent Enhances Model Generalization and Meta-learning for Few-shot Learning.

Authors :
Rao, Shuzhen
Huang, Jun
Tang, Zengming
Source :
Neurocomputing. Jun2023, Vol. 537, p271-281. 11p.
Publication Year :
2023

Abstract

Warped Gradient descent (WarpGrad) is a remarkable meta-learning method for gradient transformation by inserting warp-layers. However, the task-shared initialization provided by WarpGrad is difficult to be adaptive to each task. Moreover, transforming gradients with meta-learned warp-layers ignores the local geometric features or task-specific knowledge, and may lead to a significant risk of overfitting caused by the increase of parameters. In this paper, we propose ARWarpGrad to guarantee better generalization performance with faster convergence speed by modeling both the cross-task and task-specific knowledge. We introduce Initialization Modulation (IM) to meta-learn to initialize the task-learner specifically. Furthermore, the Mixed Gradient Preprocessing (MGP), which includes the Adaptive Learning Rates (ALR) and the Gaussian Momentum Dropout (GMD), is put forward to provide better adaptive optimization direction and length for task adaptation based on the feature of local geometries. In addition, Memory Regularization (MR) is provided to alleviate the overfitting problem effectively with the use of parameter memory. Ultimately, extensive experiments on three settings demonstrate that ARWarpGrad achieves state-of-the-art performance with convergence acceleration and overfitting prevention characteristics. [ABSTRACT FROM AUTHOR]

Subjects

Subjects :
*MACHINE learning

Details

Language :
English
ISSN :
09252312
Volume :
537
Database :
Academic Search Index
Journal :
Neurocomputing
Publication Type :
Academic Journal
Accession number :
163185738
Full Text :
https://doi.org/10.1016/j.neucom.2023.03.042