Start Over

Advanced Analysis of Learning-Based Spam Email Filtering Methods Based on Feature Distribution Differences of Dataset

Authors :: Jin-Seong Kim
Han-Jin Lee
Han-Ju Lee
Seok-Hwan Choi
Source :: IEEE Access, Vol 12, Pp 167313-167323 (2024)
Publication Year :: 2024
Publisher :: IEEE, 2024.
Abstract: Spam emails, which are unsolicited bulk emails, pose a significant threat in digital communication security. To counter spam emails, learning-based spam email filtering methods have been extensively studied. However, as spam patterns evolve, these methods face challenges in maintaining the accuracy of models trained on outdated patterns. To demonstrate these limitations empirically and gain insight into the classification patterns of spam email filtering models, we propose an advanced analysis method to analyze the performance degradation of spam email filtering models. The proposed analysis method involves text preprocessing, embedding model training, spam email filtering model training, evaluation, and analysis of the classification patterns of the learning-based spam email filtering models. From the experimental results under various datasets and spam email filtering models, we show that the accuracy of spam email filtering models significantly decreases when the feature distribution of the test dataset is different from the training dataset. We also provides valuable insights for improving the model architecture, dataset structure, and training strategies by analysis of various factors such as confusion matrix, performance metrics, mean sequence length, out-of-vocabulary (OOV) rate, and top-20 tokens.

Subjects :: Spam email filtering
recurrent neural network (RNN)
gated recurrent unit (GRU)
long short-term memory (LSTM)
ALBERT
security
Electrical engineering. Electronics. Nuclear engineering
TK1-9971

Details

Language :: English
ISSN :: 21693536
Volume :: 12
Database :: Directory of Open Access Journals
Journal :: IEEE Access
Publication Type :: Academic Journal
Accession number :: edsdoj.6005adbc7e4c899ad34be6e8b16a8b
Document Type :: article
Full Text :: https://doi.org/10.1109/ACCESS.2024.3495830

Full Text Access

View/download PDF

Tools

Email
Cite

Printer

Authors Abstract Subjects Details

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Advanced Analysis of Learning-Based Spam Email Filtering Methods Based on Feature Distribution Differences of Dataset

Abstract

Subjects

Details

Tools

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Advanced Analysis of Learning-Based Spam Email Filtering Methods Based on Feature Distribution Differences of Dataset

Abstract

Subjects

Details

Tools

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources