Author: "Akram, Yassir" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Akram, Yassir"' showing total 6 results

Start Over Author "Akram, Yassir"

6 results on '"Akram, Yassir"'

1. Learning Randomized Algorithms with Transformers

Author: von Oswald, Johannes, Kobayashi, Seijin, Akram, Yassir, and Steger, Angelika
Subjects: Computer Science - Machine Learning
Abstract: Randomization is a powerful tool that endows algorithms with remarkable properties. For instance, randomized algorithms excel in adversarial settings, often surpassing the worst-case performance of deterministic algorithms with large margins. Furthermore, their success probability can be amplified by simple strategies such as repetition and majority voting. In this paper, we enhance deep neural networks, in particular transformer models, with randomization. We demonstrate for the first time that randomized algorithms can be instilled in transformers through learning, in a purely data- and objective-driven manner. First, we analyze known adversarial objectives for which randomized algorithms offer a distinct advantage over deterministic ones. We then show that common optimization techniques, such as gradient descent or evolutionary strategies, can effectively learn transformer parameters that make use of the randomness provided to the model. To illustrate the broad applicability of randomization in empowering neural networks, we study three conceptual tasks: associative recall, graph coloring, and agents that explore grid worlds. In addition to demonstrating increased robustness against oblivious adversaries through learned randomization, our experiments reveal remarkable performance improvements due to the inherently random nature of the neural networks' computation and predictions.
Published: 2024

2. When can transformers compositionally generalize in-context?

Author: Kobayashi, Seijin, Schug, Simon, Akram, Yassir, Redhardt, Florian, von Oswald, Johannes, Pascanu, Razvan, Lajoie, Guillaume, and Sacramento, João
Subjects: Computer Science - Machine Learning, Computer Science - Neural and Evolutionary Computing
Abstract: Many tasks can be composed from a few independent components. This gives rise to a combinatorial explosion of possible tasks, only some of which might be encountered during training. Under what circumstances can transformers compositionally generalize from a subset of tasks to all possible combinations of tasks that share similar components? Here we study a modular multitask setting that allows us to precisely control compositional structure in the data generation process. We present evidence that transformers learning in-context struggle to generalize compositionally on this task despite being in principle expressive enough to do so. Compositional generalization becomes possible only when introducing a bottleneck that enforces an explicit separation between task inference and task execution., Comment: ICML 2024 workshop on Next Generation of Sequence Modeling Architectures
Published: 2024

3. Attention as a Hypernetwork

Author: Schug, Simon, Kobayashi, Seijin, Akram, Yassir, Sacramento, João, and Pascanu, Razvan
Subjects: Computer Science - Machine Learning
Abstract: Transformers can under some circumstances generalize to novel problem instances whose constituent parts might have been encountered during training but whose compositions have not. What mechanisms underlie this ability for compositional generalization? By reformulating multi-head attention as a hypernetwork, we reveal that a composable, low-dimensional latent code specifies key-query specific operations. We find empirically that this latent code is predictive of the subtasks the network performs on unseen task compositions revealing that latent codes acquired during training are reused to solve unseen problem instances. To further examine the hypothesis that the intrinsic hypernetwork of multi-head attention supports compositional generalization, we ablate whether making the hypernetwork generated linear value network nonlinear strengthens compositionality. We find that this modification improves compositional generalization on abstract reasoning tasks. In particular, we introduce a symbolic version of the Raven Progressive Matrices human intelligence test which gives us precise control over the problem compositions encountered during training and evaluation. We demonstrate on this task how scaling model size and data enables compositional generalization in transformers and gives rise to a functionally structured latent space., Comment: Code available at https://github.com/smonsays/hypernetwork-attention
Published: 2024

4. Discovering modular solutions that generalize compositionally

Author: Schug, Simon, Kobayashi, Seijin, Akram, Yassir, Wołczyk, Maciej, Proca, Alexandra, von Oswald, Johannes, Pascanu, Razvan, Sacramento, João, and Steger, Angelika
Subjects: Computer Science - Machine Learning, Computer Science - Neural and Evolutionary Computing
Abstract: Many complex tasks can be decomposed into simpler, independent parts. Discovering such underlying compositional structure has the potential to enable compositional generalization. Despite progress, our most powerful systems struggle to compose flexibly. It therefore seems natural to make models more modular to help capture the compositional nature of many tasks. However, it is unclear under which circumstances modular systems can discover hidden compositional structure. To shed light on this question, we study a teacher-student setting with a modular teacher where we have full control over the composition of ground truth modules. This allows us to relate the problem of compositional generalization to that of identification of the underlying modules. In particular we study modularity in hypernetworks representing a general class of multiplicative interactions. We show theoretically that identification up to linear transformation purely from demonstrations is possible without having to learn an exponential number of module combinations. We further demonstrate empirically that under the theoretically identified conditions, meta-learning from finite data can discover modular policies that generalize compositionally in a number of complex environments., Comment: Published as a conference paper at ICLR 2024; Code available at https://github.com/smonsays/modular-hyperteacher
Published: 2023

5. Gated recurrent neural networks discover attention

Author: Zucchet, Nicolas, Kobayashi, Seijin, Akram, Yassir, von Oswald, Johannes, Larcher, Maxime, Steger, Angelika, and Sacramento, João
Subjects: Computer Science - Machine Learning, Computer Science - Neural and Evolutionary Computing
Abstract: Recent architectural developments have enabled recurrent neural networks (RNNs) to reach and even surpass the performance of Transformers on certain sequence modeling tasks. These modern RNNs feature a prominent design pattern: linear recurrent layers interconnected by feedforward paths with multiplicative gating. Here, we show how RNNs equipped with these two design elements can exactly implement (linear) self-attention, the main building block of Transformers. By reverse-engineering a set of trained RNNs, we find that gradient descent in practice discovers our construction. In particular, we examine RNNs trained to solve simple in-context learning tasks on which Transformers are known to excel and find that gradient descent instills in our RNNs the same attention-based in-context learning algorithm used by Transformers. Our findings highlight the importance of multiplicative interactions in neural networks and suggest that certain RNNs might be unexpectedly implementing attention under the hood.
Published: 2023

6. Random initialisations performing above chance and how to find them

Author: Benzing, Frederik, Schug, Simon, Meier, Robert, von Oswald, Johannes, Akram, Yassir, Zucchet, Nicolas, Aitchison, Laurence, and Steger, Angelika
Subjects: Computer Science - Machine Learning
Abstract: Neural networks trained with stochastic gradient descent (SGD) starting from different random initialisations typically find functionally very similar solutions, raising the question of whether there are meaningful differences between different SGD solutions. Entezari et al.\ recently conjectured that despite different initialisations, the solutions found by SGD lie in the same loss valley after taking into account the permutation invariance of neural networks. Concretely, they hypothesise that any two solutions found by SGD can be permuted such that the linear interpolation between their parameters forms a path without significant increases in loss. Here, we use a simple but powerful algorithm to find such permutations that allows us to obtain direct empirical evidence that the hypothesis is true in fully connected networks. Strikingly, we find that two networks already live in the same loss valley at the time of initialisation and averaging their random, but suitably permuted initialisation performs significantly above chance. In contrast, for convolutional architectures, our evidence suggests that the hypothesis does not hold. Especially in a large learning rate regime, SGD seems to discover diverse modes., Comment: NeurIPS 2022, 14th Annual Workshop on Optimization for Machine Learning (OPT2022)
Published: 2022

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

6 results on '"Akram, Yassir"'

1. Learning Randomized Algorithms with Transformers

2. When can transformers compositionally generalize in-context?

3. Attention as a Hypernetwork

4. Discovering modular solutions that generalize compositionally

5. Gated recurrent neural networks discover attention

6. Random initialisations performing above chance and how to find them

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Database

6 results on '"Akram, Yassir"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources