Back to Search Start Over

When can transformers compositionally generalize in-context?

Authors :
Kobayashi, Seijin
Schug, Simon
Akram, Yassir
Redhardt, Florian
von Oswald, Johannes
Pascanu, Razvan
Lajoie, Guillaume
Sacramento, João
Publication Year :
2024

Abstract

Many tasks can be composed from a few independent components. This gives rise to a combinatorial explosion of possible tasks, only some of which might be encountered during training. Under what circumstances can transformers compositionally generalize from a subset of tasks to all possible combinations of tasks that share similar components? Here we study a modular multitask setting that allows us to precisely control compositional structure in the data generation process. We present evidence that transformers learning in-context struggle to generalize compositionally on this task despite being in principle expressive enough to do so. Compositional generalization becomes possible only when introducing a bottleneck that enforces an explicit separation between task inference and task execution.<br />Comment: ICML 2024 workshop on Next Generation of Sequence Modeling Architectures

Details

Database :
arXiv
Publication Type :
Report
Accession number :
edsarx.2407.12275
Document Type :
Working Paper