Back to Search Start Over

Phi-4 Technical Report

Authors :
Abdin, Marah
Aneja, Jyoti
Behl, Harkirat
Bubeck, Sébastien
Eldan, Ronen
Gunasekar, Suriya
Harrison, Michael
Hewett, Russell J.
Javaheripi, Mojan
Kauffmann, Piero
Lee, James R.
Lee, Yin Tat
Li, Yuanzhi
Liu, Weishung
Mendes, Caio C. T.
Nguyen, Anh
Price, Eric
de Rosa, Gustavo
Saarikivi, Olli
Salim, Adil
Shah, Shital
Wang, Xin
Ward, Rachel
Wu, Yue
Yu, Dingli
Zhang, Cyril
Zhang, Yi
Publication Year :
2024

Abstract

We present phi-4, a 14-billion parameter language model developed with a training recipe that is centrally focused on data quality. Unlike most language models, where pre-training is based primarily on organic data sources such as web content or code, phi-4 strategically incorporates synthetic data throughout the training process. While previous models in the Phi family largely distill the capabilities of a teacher model (specifically GPT-4), phi-4 substantially surpasses its teacher model on STEM-focused QA capabilities, giving evidence that our data-generation and post-training techniques go beyond distillation. Despite minimal changes to the phi-3 architecture, phi-4 achieves strong performance relative to its size -- especially on reasoning-focused benchmarks -- due to improved data, training curriculum, and innovations in the post-training scheme.

Details

Database :
arXiv
Publication Type :
Report
Accession number :
edsarx.2412.08905
Document Type :
Working Paper