Start Over

Raw Text is All you Need: Knowledge-intensive Multi-turn Instruction Tuning for Large Language Model

Authors :: Hou, Xia
Li, Qifeng
Yang, Jian
Li, Tongliang
Chai, Linzheng
Wu, Xianjie
Ji, Hangyuan
Li, Zhoujun
Nie, Jixuan
Dun, Jingbo
Song, Wenfeng
Publication Year :: 2024
Abstract: Instruction tuning as an effective technique aligns the outputs of large language models (LLMs) with human preference. But how to generate the seasonal multi-turn dialogues from raw documents for instruction tuning still requires further exploration. In this paper, we present a novel framework named R2S that leverages the CoD-Chain of Dialogue logic to guide large language models (LLMs) in generating knowledge-intensive multi-turn dialogues for instruction tuning. By integrating raw documents from both open-source datasets and domain-specific web-crawled documents into a benchmark K-BENCH, we cover diverse areas such as Wikipedia (English), Science (Chinese), and Artifacts (Chinese). Our approach first decides the logic flow of the current dialogue and then prompts LLMs to produce key phrases for sourcing relevant response content. This methodology enables the creation of the G I NSTRUCT instruction dataset, retaining raw document knowledge within dialoguestyle interactions. Utilizing this dataset, we fine-tune GLLM, a model designed to transform raw documents into structured multi-turn dialogues, thereby injecting comprehensive domain knowledge into the SFT model for enhanced instruction tuning. This work signifies a stride towards refining the adaptability and effectiveness of LLMs in processing and generating more accurate, contextually nuanced responses across various fields.<br />Comment: 11 pages, 3 figures

Subjects :: Computer Science - Computation and Language
Computer Science - Artificial Intelligence
68T50
I.2.7

Details

Database :: arXiv
Publication Type :: Report
Accession number :: edsarx.2407.03040
Document Type :: Working Paper

Tools

Email
Cite

Printer

Authors Abstract Subjects Details

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Raw Text is All you Need: Knowledge-intensive Multi-turn Instruction Tuning for Large Language Model

Abstract

Subjects

Details

Tools

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Raw Text is All you Need: Knowledge-intensive Multi-turn Instruction Tuning for Large Language Model

Abstract

Subjects

Details

Tools

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources