1. LIMBA: An Open-Source Framework for the Preservation and Valorization of Low-Resource Languages using Generative Models
- Author
-
Carta, Salvatore Mario, Chessa, Stefano, Contu, Giulia, Corriga, Andrea, Deidda, Andrea, Fenu, Gianni, Frigau, Luca, Giuliani, Alessandro, Grassi, Luca, Manca, Marco Manolo, Marras, Mirko, Mola, Francesco, Mossa, Bastianino, Mura, Piergiorgio, Ortu, Marco, Piano, Leonardo, Pisano, Simone, Pisu, Alessia, Podda, Alessandro Sebastian, Pompianu, Livio, Seu, Simone, and Tiddia, Sandro Gabriele
- Subjects
Computer Science - Computation and Language ,Computer Science - Artificial Intelligence - Abstract
Minority languages are vital to preserving cultural heritage, yet they face growing risks of extinction due to limited digital resources and the dominance of artificial intelligence models trained on high-resource languages. This white paper proposes a framework to generate linguistic tools for low-resource languages, focusing on data creation to support the development of language models that can aid in preservation efforts. Sardinian, an endangered language, serves as the case study to demonstrate the framework's effectiveness. By addressing the data scarcity that hinders intelligent applications for such languages, we contribute to promoting linguistic diversity and support ongoing efforts in language standardization and revitalization through modern technologies.
- Published
- 2024