Back to Search Start Over

Beyond Human Data: Scaling Self-Training for Problem-Solving with Language Models

Authors :
Singh, Avi
Co-Reyes, John D.
Agarwal, Rishabh
Anand, Ankesh
Patil, Piyush
Garcia, Xavier
Liu, Peter J.
Harrison, James
Lee, Jaehoon
Xu, Kelvin
Parisi, Aaron
Kumar, Abhishek
Alemi, Alex
Rizkowsky, Alex
Nova, Azade
Adlam, Ben
Bohnet, Bernd
Elsayed, Gamaleldin
Sedghi, Hanie
Mordatch, Igor
Simpson, Isabelle
Gur, Izzeddin
Snoek, Jasper
Pennington, Jeffrey
Hron, Jiri
Kenealy, Kathleen
Swersky, Kevin
Mahajan, Kshiteej
Culp, Laura
Xiao, Lechao
Bileschi, Maxwell L.
Constant, Noah
Novak, Roman
Liu, Rosanne
Warkentin, Tris
Qian, Yundi
Bansal, Yamini
Dyer, Ethan
Neyshabur, Behnam
Sohl-Dickstein, Jascha
Fiedel, Noah
Publication Year :
2023

Abstract

Fine-tuning language models~(LMs) on human-generated data remains a prevalent practice. However, the performance of such models is often limited by the quantity and diversity of high-quality human data. In this paper, we explore whether we can go beyond human data on tasks where we have access to scalar feedback, for example, on math problems where one can verify correctness. To do so, we investigate a simple self-training method based on expectation-maximization, which we call ReST$^{EM}$, where we (1) generate samples from the model and filter them using binary feedback, (2) fine-tune the model on these samples, and (3) repeat this process a few times. Testing on advanced MATH reasoning and APPS coding benchmarks using PaLM-2 models, we find that ReST$^{EM}$ scales favorably with model size and significantly surpasses fine-tuning only on human data. Overall, our findings suggest self-training with feedback can substantially reduce dependence on human-generated data.<br />Comment: Accepted to TMLR. Camera-ready version. First three authors contributed equally

Details

Database :
arXiv
Publication Type :
Report
Accession number :
edsarx.2312.06585
Document Type :
Working Paper