Back to Search Start Over

Stability of accuracy for the training of DNNs via the uniform doubling condition.

Authors :
Shmalo, Yitzchak
Source :
Annals of Mathematics & Artificial Intelligence. Apr2024, Vol. 92 Issue 2, p439-483. 45p.
Publication Year :
2024

Abstract

We study the stability of accuracy during the training of deep neural networks (DNNs). In this context, the training of a DNN is performed via the minimization of a cross-entropy loss function, and the performance metric is accuracy (the proportion of objects that are classified correctly). While training results in a decrease of loss, the accuracy does not necessarily increase during the process and may sometimes even decrease. The goal of achieving stability of accuracy is to ensure that if accuracy is high at some initial time, it remains high throughout training. A recent result by Berlyand, Jabin, and Safsten introduces a doubling condition on the training data, which ensures the stability of accuracy during training for DNNs using the absolute value activation function. For training data in R n , this doubling condition is formulated using slabs in R n and depends on the choice of the slabs. The goal of this paper is twofold. First, to make the doubling condition uniform, that is, independent of the choice of slabs. This leads to sufficient conditions for stability in terms of training data only. In other words, for a training set T that satisfies the uniform doubling condition, there exists a family of DNNs such that a DNN from this family with high accuracy on the training set at some training time t 0 will have high accuracy for all time t > t 0 . Moreover, establishing uniformity is necessary for the numerical implementation of the doubling condition. We demonstrate how to numerically implement a simplified version of this uniform doubling condition on a dataset and apply it to achieve stability of accuracy using a few model examples. The second goal is to extend the original stability results from the absolute value activation function to a broader class of piecewise linear activation functions with finitely many critical points, such as the popular Leaky ReLU. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
10122443
Volume :
92
Issue :
2
Database :
Academic Search Index
Journal :
Annals of Mathematics & Artificial Intelligence
Publication Type :
Academic Journal
Accession number :
176842421
Full Text :
https://doi.org/10.1007/s10472-023-09919-1