Start Over

Ensuring Safe and High-Quality Outputs: A Guideline Library Approach for Language Models

Authors :: Luo, Yi
Lin, Zhenghao
Zhang, Yuhao
Sun, Jiashuo
Lin, Chen
Xu, Chengjin
Su, Xiangdong
Shen, Yelong
Guo, Jian
Gong, Yeyun
Publication Year :: 2024
Abstract: Large Language Models (LLMs) exhibit impressive capabilities but also present risks such as biased content generation and privacy issues. One of the current alignment techniques includes principle-driven integration, but it faces challenges arising from the imprecision of manually crafted rules and inadequate risk perception in models without safety training. To address these, we introduce Guide-Align, a two-stage approach. Initially, a safety-trained model identifies potential risks and formulates specific guidelines for various inputs, establishing a comprehensive library of guidelines and a model for input-guidelines retrieval. Subsequently, the retrieval model correlates new inputs with relevant guidelines, which guide LLMs in response generation to ensure safe and high-quality outputs, thereby aligning with human values. An additional optional stage involves fine-tuning a model with well-aligned datasets generated through the process implemented in the second stage. Our method customizes guidelines to accommodate diverse inputs, thereby enhancing the fine-grainedness and comprehensiveness of the guideline library. Furthermore, it incorporates safety expertise from a safety-trained LLM through a lightweight retrieval model. We evaluate our approach on three benchmarks, demonstrating significant improvements in LLM security and quality. Notably, our fine-tuned model, Labrador, even at 13 billion parameters, outperforms GPT-3.5-turbo and surpasses GPT-4 in alignment capabilities.<br />Comment: Accepted to NAACL 2024 main conference

Subjects :: Computer Science - Computation and Language
Computer Science - Artificial Intelligence

Details

Database :: arXiv
Publication Type :: Report
Accession number :: edsarx.2403.11838
Document Type :: Working Paper

Tools

Email
Cite

Printer

Authors Abstract Subjects Details

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Ensuring Safe and High-Quality Outputs: A Guideline Library Approach for Language Models

Abstract

Subjects

Details

Tools

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Ensuring Safe and High-Quality Outputs: A Guideline Library Approach for Language Models

Abstract

Subjects

Details

Tools

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources