1. Efficiently Distilling LLMs for Edge Applications
- Author
-
Kundu, Achintya, Lim, Fabian, Chew, Aaron, Wynter, Laura, Chong, Penny, Lee, Rhui Dih, Kundu, Achintya, Lim, Fabian, Chew, Aaron, Wynter, Laura, Chong, Penny, and Lee, Rhui Dih
- Abstract
Supernet training of LLMs is of great interest in industrial applications as it confers the ability to produce a palette of smaller models at constant cost, regardless of the number of models (of different size / latency) produced. We propose a new method called Multistage Low-rank Fine-tuning of Super-transformers (MLFS) for parameter-efficient supernet training. We show that it is possible to obtain high-quality encoder models that are suitable for commercial edge applications, and that while decoder-only models are resistant to a comparable degree of compression, decoders can be effectively sliced for a significant reduction in training time., Comment: This paper has been accepted for publication in NAACL 2024 (Industry Track)
- Published
- 2024