Start Over

Beyond Human Data: Scaling Self-Training for Problem-Solving with Language Models

Authors :: Singh, Avi
Co-Reyes, John D.
Agarwal, Rishabh
Anand, Ankesh
Patil, Piyush
Garcia, Xavier
Liu, Peter J.
Harrison, James
Lee, Jaehoon
Xu, Kelvin
Parisi, Aaron
Kumar, Abhishek
Alemi, Alex
Rizkowsky, Alex
Nova, Azade
Adlam, Ben
Bohnet, Bernd
Elsayed, Gamaleldin
Sedghi, Hanie
Mordatch, Igor
Simpson, Isabelle
Gur, Izzeddin
Snoek, Jasper
Pennington, Jeffrey
Hron, Jiri
Kenealy, Kathleen
Swersky, Kevin
Mahajan, Kshiteej
Culp, Laura
Xiao, Lechao
Bileschi, Maxwell L.
Constant, Noah
Novak, Roman
Liu, Rosanne
Warkentin, Tris
Qian, Yundi
Bansal, Yamini
Dyer, Ethan
Neyshabur, Behnam
Sohl-Dickstein, Jascha
Fiedel, Noah
Publication Year :: 2023
Abstract: Fine-tuning language models~(LMs) on human-generated data remains a prevalent practice. However, the performance of such models is often limited by the quantity and diversity of high-quality human data. In this paper, we explore whether we can go beyond human data on tasks where we have access to scalar feedback, for example, on math problems where one can verify correctness. To do so, we investigate a simple self-training method based on expectation-maximization, which we call ReST$^{EM}$, where we (1) generate samples from the model and filter them using binary feedback, (2) fine-tune the model on these samples, and (3) repeat this process a few times. Testing on advanced MATH reasoning and APPS coding benchmarks using PaLM-2 models, we find that ReST$^{EM}$ scales favorably with model size and significantly surpasses fine-tuning only on human data. Overall, our findings suggest self-training with feedback can substantially reduce dependence on human-generated data.<br />Comment: Accepted to TMLR. Camera-ready version. First three authors contributed equally

Subjects :: Computer Science - Machine Learning

Details

Database :: arXiv
Publication Type :: Report
Accession number :: edsarx.2312.06585
Document Type :: Working Paper

Tools

Email
Cite

Printer

Authors Abstract Subjects Details

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Beyond Human Data: Scaling Self-Training for Problem-Solving with Language Models

Abstract

Subjects

Details

Tools

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Beyond Human Data: Scaling Self-Training for Problem-Solving with Language Models

Abstract

Subjects

Details

Tools

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources