Back to Search Start Over

Web-S4AE: a semi-supervised stacked sparse autoencoder model for web robot detection.

Authors :
Jagat, Rikhi Ram
Sisodia, Dilip Singh
Singh, Pradeep
Source :
Neural Computing & Applications. Aug2023, Vol. 35 Issue 24, p17883-17898. 16p.
Publication Year :
2023

Abstract

Web robots are automated computer programs that can be exploited for benign and malicious activities such as website indexing, monitoring, or unauthorized content scraping and scalping. Several methods are available to detect automated web robots through their footprints and behaviors. Although the accuracy and efficiency of existing methods depend highly on the labeled web log data, countless web requests are generated daily with the help of web robots. Exhaustive and accurate manual labeling of reconstructed sessions is time-consuming and challenging. Further, effective detection of web robots is more challenging with unlabeled or partially labeled data. To address the aforementioned issues, we reformulated web robot detection as a semi-supervised learning problem. In this paper, we propose a deep learning-based Semi-Supervised Stacked Sparse AutoEncoder (Web-S4AE) for web robot detection. The proposed model uses content-based features and features extracted from web access log data to effectively classify web robots. The experiments were conducted on publicly available web log data from a library and information portal to assess the performance of Web-S4AE. The Web-S4AE model was trained in two phases. The first phase; comprises training the model with unlabeled data to extract the hidden information, and in the second phase, the model is fine-tuned using labeled data. The results suggest that incorporating more unlabeled data can significantly improve the classifier's performance. The Web-S4AE model's performance was also compared with other models such as the Decision Tree (DT), Random Forest (RF), eXtreme Gradient Boosting (XGBoost), and Multi-Layer Perceptron (MLP). [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
09410643
Volume :
35
Issue :
24
Database :
Academic Search Index
Journal :
Neural Computing & Applications
Publication Type :
Academic Journal
Accession number :
167308562
Full Text :
https://doi.org/10.1007/s00521-023-08668-w