Back to Search Start Over

GlitchMiner: Mining Glitch Tokens in Large Language Models via Gradient-based Discrete Optimization

Authors :
Wu, Zihui
Gao, Haichang
Wang, Ping
Zhang, Shudong
Liu, Zhaoxiang
Lian, Shiguo
Publication Year :
2024

Abstract

Glitch tokens in Large Language Models (LLMs) can trigger unpredictable behaviors, threatening model reliability and safety. Existing detection methods rely on predefined patterns, limiting their adaptability across diverse LLM architectures. We propose GlitchMiner, a gradient-based discrete optimization framework that efficiently identifies glitch tokens by introducing entropy as a measure of prediction uncertainty and employing a local search strategy to explore the token space. Experiments across multiple LLM architectures demonstrate that GlitchMiner outperforms existing methods in detection accuracy and adaptability, achieving over 10% average efficiency improvement. This method enhances vulnerability assessment in LLMs, contributing to the development of more robust and reliable applications. Code is available at https://github.com/wooozihui/GlitchMiner.

Details

Database :
arXiv
Publication Type :
Report
Accession number :
edsarx.2410.15052
Document Type :
Working Paper