Start Over

BotChat: Evaluating LLMs' Capabilities of Having Multi-Turn Dialogues

Authors :: Duan, Haodong
Wei, Jueqi
Wang, Chonghua
Liu, Hongwei
Fang, Yixiao
Zhang, Songyang
Lin, Dahua
Chen, Kai
Duan, Haodong
Wei, Jueqi
Wang, Chonghua
Liu, Hongwei
Fang, Yixiao
Zhang, Songyang
Lin, Dahua
Chen, Kai
Publication Year :: 2023
Abstract: Interacting with human via high-quality multi-turn dialogues is a key feature of large language models (LLMs). However, human-based evaluation of such capability involves intensive manual labor. This report provides a preliminary evaluation of existing large language models for human-style multi-turn chatting, through an LLM-based approach. We start from real-world human dialogues and keep the very first utterances as the ChatSEED. Then we prompt LLMs to generate a full multi-turn dialogue (tens of utterances) based on the ChatSEED, utterance by utterance. Finally, we adopt state-of-the-art LLMs (GPT-4, \etc) as the judge to evaluate the generated dialogues. With different evaluation protocols, we come to substantially identical conclusions. We find that GPT-4 can generate human-style multi-turn dialogues with impressive quality, significantly outperforms its counterparts. It's difficult for a discriminator to distinguish between GPT-4 generated dialogues and human dialogues. In contrast, other LLMs struggle to generate multi-turn dialogues of satisfactory quality due to poor instruction-following capability, tendency to generate lengthy utterances, or limited general capability. All data and codes will be provided in https://github.com/open-compass/BotChat/ and we hope they can serve as a valuable resource for evaluating multi-turn chatting capabilities of LLMs.

Details

Database :: OAIster
Publication Type :: Electronic Resource
Accession number :: edsoai.on1438491381
Document Type :: Electronic Resource

Tools

Email
Cite

Printer

Authors Abstract Subjects Details

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

BotChat: Evaluating LLMs' Capabilities of Having Multi-Turn Dialogues

Abstract

Details

Tools

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

BotChat: Evaluating LLMs' Capabilities of Having Multi-Turn Dialogues

Abstract

Details

Tools

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources