Back to Search Start Over

SELF-[IN]CORRECT: LLMs Struggle with Discriminating Self-Generated Responses

Authors :
Jiang, Dongwei
Zhang, Jingyu
Weller, Orion
Weir, Nathaniel
Van Durme, Benjamin
Khashabi, Daniel
Publication Year :
2024

Abstract

Can LLMs consistently improve their previous outputs for better results? For this to be true, LLMs would need to be better at discriminating among previously-generated alternatives, than generating initial responses. We explore the validity of this hypothesis in practice. We first formulate a unified framework that allows us to compare the generative and discriminative capability of any model on any task. In our resulting experimental analysis of several open-source and industrial LLMs, we observe that models are not reliably better at discriminating among previously-generated alternatives than generating initial responses. This finding challenges the notion that LLMs may be able to enhance their performance only through their own judgment.

Details

Database :
arXiv
Publication Type :
Report
Accession number :
edsarx.2404.04298
Document Type :
Working Paper