1. Improving Large Language Model Russian Adaptation with Preliminary Vocabulary Optimization.
- Author
-
Tikhomirov, M. M. and Chernyshev, D. I.
- Abstract
Most of Large Language Model (LLM) text comprehension capabilities come from generative pre-training on large corpora which includes texts of different domains, languages and tasks. As a consequence the LLM performance in a specific language depends on its representation in the training data which for most state-of-the-art models was biased towards English language. The issue is commonly alleviated by further pre-training on the target language, however, due to limited model capacity this often results in knowledge forgetting and text understanding degradation. We argue that the performance drop can be avoided by employing parameter-efficient tuning methods that preserve the integrity of the original model. In this work, we investigate the effectiveness of different vocabulary optimization and adapter tuning schemes for LLM Russian adaptation. Our experimental results with Solar-10.7B LLM show that language adaptation process can be substantially accelerated by transferring the embeddings from smaller language-tuned counterparts. Moreover, we find that preliminary vocabulary optimization stabilizes further adapter-tuning thus improving target language generalization. By applying our two-stage language adaptation approach we obtain state-of-the-art results on Russian Super Glue and MMLU-RU language understanding datasets for sub-30B parameter open-source LLMs. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF