Start Over

An analytic theory of shallow networks dynamics for hinge loss classification

Authors :: Pellegrini, Franco
Biroli, Giulio
Laboratoire de physique de l'ENS - ENS Paris (LPENS (UMR_8023))
École normale supérieure - Paris (ENS Paris)
Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-Sorbonne Université (SU)-Centre National de la Recherche Scientifique (CNRS)-Université de Paris (UP)
Systèmes Désordonnés et Applications
Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-Sorbonne Université (SU)-Centre National de la Recherche Scientifique (CNRS)-Université de Paris (UP)-École normale supérieure - Paris (ENS Paris)
ANR-19-P3IA-0001,PRAIRIE,PaRis Artificial Intelligence Research InstitutE(2019)
Laboratoire de physique de l'ENS - ENS Paris (LPENS)
Centre National de la Recherche Scientifique (CNRS)-Université de Paris (UP)-Sorbonne Université (SU)-École normale supérieure - Paris (ENS Paris)
Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)
Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-Centre National de la Recherche Scientifique (CNRS)-Université de Paris (UP)-Sorbonne Université (SU)-École normale supérieure - Paris (ENS Paris)
Source :: NeurIPS 2020, NeurIPS 2020, 2020
Publication Year :: 2020
Publisher :: HAL CCSD, 2020.
Abstract: 16 pages, 6 figures; International audience; Neural networks have been shown to perform incredibly well in classification tasks over structured high-dimensional datasets. However, the learning dynamics of such networks is still poorly understood. In this paper we study in detail the training dynamics of a simple type of neural network: a single hidden layer trained to perform a classification task. We show that in a suitable mean-field limit this case maps to a single-node learning problem with a time-dependent dataset determined self-consistently from the average nodes population. We specialize our theory to the prototypical case of a linearly separable dataset and a linear hinge loss, for which the dynamics can be explicitly solved. This allow us to address in a simple setting several phenomena appearing in modern networks such as slowing down of training dynamics, crossover between rich and lazy learning, and overfitting. Finally, we asses the limitations of mean-field theory by studying the case of large but finite number of nodes and of training samples.

Subjects :: [INFO]Computer Science [cs]

Details

Language :: English
Database :: OpenAIRE
Journal :: NeurIPS 2020, NeurIPS 2020, 2020
Accession number :: edsair.dedup.wf.001..967745bf112f2772f0049341c0fef258

Tools

Email
Cite

Printer

Authors Abstract Subjects Details

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

An analytic theory of shallow networks dynamics for hinge loss classification

Abstract

Subjects

Details

Tools

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

An analytic theory of shallow networks dynamics for hinge loss classification

Abstract

Subjects

Details

Tools

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources