Back to Search Start Over

Benefits of Jointly Training Autoencoders: An Improved Neural Tangent Kernel Analysis.

Authors :
Nguyen, Thanh V.
Wong, Raymond K. W.
Hegde, Chinmay
Source :
IEEE Transactions on Information Theory. Jul2021, Vol. 67 Issue 7, p4669-4692. 24p.
Publication Year :
2021

Abstract

Deep neural networks can achieve impressive performance in the regime where they are massively over-parameterized. Consequently, over the past year, there has been a growing interest in analyzing optimization and generalization properties of over-parameterized networks. However, the majority of existing work only applies to supervised learning. The role of over-parameterization in the unsupervised setting has by contrast gained far less attention. In this paper, we study the inductive bias of gradient descent for two-layer over-parameterized autoencoders with ReLU activation. We first provide theoretical evidence for the memorization phenomena observed in recent work using the property that infinitely wide neural networks under gradient descent evolve as linear models. We also analyze the gradient dynamics of the autoencoders in the finite-width setting. Starting from a randomly initialized autoencoder network, we rigorously prove the linear convergence of gradient descent in two weakly-trained and jointly-trained regimes. Our results indicate the considerable benefits of joint training over weak training in finding global optima, achieving a dramatic decrease in the required level of over-parameterization. Finally, we analyze the case of weight-tied autoencoders and prove that in the over-parameterized setting, training such networks from randomly initialized points leads to certain unexpected degeneracies. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
00189448
Volume :
67
Issue :
7
Database :
Academic Search Index
Journal :
IEEE Transactions on Information Theory
Publication Type :
Academic Journal
Accession number :
151250050
Full Text :
https://doi.org/10.1109/TIT.2021.3065212