1. ScamGen: Unveiling psychological patterns in tele-scam through advanced template-augmented corpus generation.
- Author
-
Han, Xu, Li, Qiang, Qi, Yaling, Cao, Hongbo, Pedrycz, Witold, and Wang, Wei
- Subjects
- *
VICTIM psychology , *NATURAL language processing , *PSYCHOLOGY , *TELEPHONES , *DEEP learning , *FRAUD , *THEORY - Abstract
Telephone scams, with their profound psychological impact, often compel victims to make hasty and severe decisions. Studying these scams is challenging due to the scarcity of comprehensive datasets, a result of the private nature of telephone interactions. In this paper, we introduce ScamGen , a template-based data augmentation technique designed to enhance Chinese telephone scam data. ScamGen leverages psychological insights to generate diverse and realistic scam scenarios, focusing on the psychological dynamics between scammers and victims. This novel approach integrates psychological theory with data augmentation, diverging from traditional methods by emphasizing scammer–victim interactions. Our method begins with a multi-source data collection framework, compiling an initial seed dataset of tele-scam samples. Using sentence- and word-level perturbations, we expand this seed data to create a comprehensive and diverse dataset covering a wide range of scam scenarios. Rigorous evaluations demonstrate that ScamGen outperforms large language models in generating high-quality, varied datasets. Additionally, we develop five deep learning models for intent detection on this dataset, with BERT achieving the highest precision at 86.68%. The dataset, which will be made publicly available, marks a significant step toward understanding scammer tactics and improving tele-scam detection systems. • Introduces ScamGen , a technique for enhancing Chinese telephone scam data. • Leverages psychological insights to generate diverse scam scenarios. • Expands a seed dataset using sentence- and word-level perturbations. • Outperforms large language models in quality and variety of datasets. • Develops deep learning models for intent detection, with BERT achieving 86.68%. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF