Back to Search Start Over

No Reason for No Supervision: Improved Generalization in Supervised Models

Authors :
Sariyildiz, Mert Bulent
Kalantidis, Yannis
Alahari, Karteek
Larlus, Diane
Naver Labs Europe [Meylan]
Apprentissage de modèles à partir de données massives (Thoth)
Inria Grenoble - Rhône-Alpes
Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire Jean Kuntzmann (LJK)
Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Université Grenoble Alpes (UGA)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )
Université Grenoble Alpes (UGA)-Centre National de la Recherche Scientifique (CNRS)-Université Grenoble Alpes (UGA)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )
Université Grenoble Alpes (UGA)
ANR-18-CE23-0011
ANR-18-CE23-0011,AVENUE,Réseau de mémoire visuelle pour l'interprétation de scènes(2018)
ANR-19-P3IA-0003,MIAI,MIAI @ Grenoble Alpes(2019)
Source :
ICLR 2023-International Conference on Learning Representations, ICLR 2023-International Conference on Learning Representations, May 2023, Kigali, Rwanda. pp.1-26
Publication Year :
2022

Abstract

We consider the problem of training a deep neural network on a given classification task, e.g., ImageNet-1K (IN1K), so that it excels at both the training task as well as at other (future) transfer tasks. These two seemingly contradictory properties impose a trade-off between improving the model's generalization and maintaining its performance on the original task. Models trained with self-supervised learning tend to generalize better than their supervised counterparts for transfer learning; yet, they still lag behind supervised models on IN1K. In this paper, we propose a supervised learning setup that leverages the best of both worlds. We extensively analyze supervised training using multi-scale crops for data augmentation and an expendable projector head, and reveal that the design of the projector allows us to control the trade-off between performance on the training task and transferability. We further replace the last layer of class weights with class prototypes computed on the fly using a memory bank and derive two models: t-ReX that achieves a new state of the art for transfer learning and outperforms top methods such as DINO and PAWS on IN1K, and t-ReX* that matches the highly optimized RSB-A1 model on IN1K while performing better on transfer tasks. Code and pretrained models: https://europe.naverlabs.com/t-rex<br />Accepted to ICLR 2023 (spotlight)

Details

Language :
English
Database :
OpenAIRE
Journal :
ICLR 2023-International Conference on Learning Representations, ICLR 2023-International Conference on Learning Representations, May 2023, Kigali, Rwanda. pp.1-26
Accession number :
edsair.doi.dedup.....bc4270cf1dafb5dfaa8c0e7e283008e8