1. Differentiable neural architecture learning for efficient neural networks.
- Author
-
Guo, Qingbei, Wu, Xiao-Jun, Kittler, Josef, and Feng, Zhiquan
- Subjects
- *
ARTIFICIAL neural networks , *COMPUTATIONAL complexity , *CONVOLUTIONAL neural networks - Abstract
• We build a new standalone control module based on the scaled sigmoid function to enrich the neural network module family to enable the neural architecture optimization. • Our DNAL method produces no candidate neural architectures but one, thus drastically improving the learning efficiency, i.e., costing 20 epochs for CIFAR-10 and 10 epochs for ImageNet. • It is applicable to conventional CNNs, lightweight CNNs, and stochastic supernets. • Extensive experiments confirm that our DNAL method achieves excellent performance on various CNN architectures, including VGG16, ResNet50, MobileNetV2, and ProxylessNAS, over the task of CIFAR-10 and ImageNet-1K classification. Efficient neural networks has received ever-increasing attention with the evolution of convolutional neural networks (CNNs), especially involving their deployment on embedded and mobile platforms. One of the biggest problems to obtaining such efficient neural networks is efficiency, even recent differentiable neural architecture search (DNAS) requires to sample a small number of candidate neural architectures for the selection of the optimal neural architecture. To address this computational efficiency issue, we introduce a novel architecture parameterization based on scaled sigmoid function , and propose a general Differentiable Neural Architecture Learning (DNAL) method to obtain efficient neural networks without the need to evaluate candidate neural networks. Specifically, for stochastic supernets as well as conventional CNNs, we build a new channel-wise module layer with the architecture components controlled by a scaled sigmoid function. We train these neural network models from scratch. The network optimization is decoupled into the weight optimization and the architecture optimization, which avoids the interaction between the two types of parameters and alleviates the vanishing gradient problem. We address the non-convex optimization problem of efficient neural networks by the continuous scaled sigmoid method instead of the common softmax method. Extensive experiments demonstrate our DNAL method delivers superior performance in terms of efficiency, and adapts to conventional CNNs (e.g., VGG16 and ResNet50), lightweight CNNs (e.g., MobileNetV2) and stochastic supernets (e.g., ProxylessNAS). The optimal neural networks learned by DNAL surpass those produced by the state-of-the-art methods on the benchmark CIFAR-10 and ImageNet-1K dataset in accuracy, model size and computational complexity. Our source code is available at https://github.com/QingbeiGuo/DNAL.git. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF