Author: "Sugolov, Anton" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Sugolov, Anton"' showing total 3 results

Start Over Author "Sugolov, Anton"

Author: Aubry, Murdock, Meng, Haoming, Sugolov, Anton, and Papyan, Vardan
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Computation and Language
Abstract: Large Language Models (LLMs) have made significant strides in natural language processing, and a precise understanding of the internal mechanisms driving their success is essential. In this work, we trace the trajectories of individual tokens as they pass through transformer blocks, and linearize the system along these trajectories through their Jacobian matrices. By examining the relationships between these Jacobians, we uncover a $\textbf{transformer block coupling}$ phenomenon in a variety of LLMs, characterized by the coupling of their top singular vectors across tokens and depth. Our findings reveal that coupling $\textit{positively correlates}$ with model performance, and that this relationship is stronger than with other hyperparameters, namely parameter budget, model depth, and embedding dimension. We further investigate the emergence of these properties through training, noting the development of coupling, as well as an increase in linearity and layer-wise exponential growth in the token trajectories. These collective insights provide a novel perspective on the interactions between token embeddings, and prompt further approaches to study training and generalization in LLMs.
Published: 2024

Author: Sugolov, Anton, primary, Emmenegger, Eric, additional, Paterson, Andrew D., additional, and Sun, Lei, additional
Published: 2022
Full Text: View/download PDF

Books, media, physical & digital resources

Searchworks