Back to Search Start Over

A Generic Improvement to Deep Residual Networks Based on Gradient Flow.

Authors :
Santhanam, Venkataraman
Davis, Larry S.
Source :
IEEE Transactions on Neural Networks & Learning Systems; Jul2020, Vol. 31 Issue 7, p2490-2499, 10p
Publication Year :
2020

Abstract

Preactivation ResNets consistently outperforms the original postactivation ResNets on the CIFAR10/100 classification benchmark. However, these results surprisingly do not carry over to the standard ImageNet benchmark. First, we theoretically analyze this incongruity in terms of how the two variants differ in handling the propagation of gradients. Although identity shortcuts are critical in both variants for improving optimization and performance, we show that postactivation variants enable early layers to receive a diverse dynamic composition of gradients from effectively deeper paths in comparison to preactivation variants, enabling the network to make maximal use of its representational capacity. Second, we show that downsampling projections (while only a few in number) have a significantly detrimental effect on performance. We show that by simply replacing downsampling projections with identitylike dense-reshape shortcuts, the classification results of standard residual architectures such as ResNets, ResNeXts, and SE-Nets improve by up to 1.2% on ImageNet, without any increase in computational complexity (FLOPs). [ABSTRACT FROM AUTHOR]

Subjects

Subjects :
COMPUTER architecture

Details

Language :
English
ISSN :
2162237X
Volume :
31
Issue :
7
Database :
Complementary Index
Journal :
IEEE Transactions on Neural Networks & Learning Systems
Publication Type :
Periodical
Accession number :
144568152
Full Text :
https://doi.org/10.1109/TNNLS.2019.2929198