Back to Search
Start Over
XML-AD: Detecting anomalous patterns in XML documents
- Source :
- Information Sciences. 326:71-88
- Publication Year :
- 2016
- Publisher :
- Elsevier BV, 2016.
-
Abstract
- Many information systems use XML documents to store data and to interact with other systems. Abnormal documents, which can be the result of either an on-going cyber attack or the actions of a benign user, can potentially harm the interacting systems and are therefore regarded as a threat. In this paper we address the problem of anomaly detection and localization in XML documents using machine learning techniques. We present XML-AD - a new XML anomaly detection framework. Within this framework, an automatic method for extraction of feature from XML documents as well as a practical method for transforming XML features into vectors of fixed dimensionality was developed. With these two methods in place, the XML-AD framework makes it possible to utilize general learning algorithms for anomaly detection. The core of the framework consists of a novel multi-univariate anomaly detection algorithm, ADIFA. The framework was evaluated using four XML documents datasets which were obtained from real information systems. It achieved over 89% true positive detection rate with less than 0.2% of false positives.
- Subjects :
- Information Systems and Management
Information retrieval
computer.internet_protocol
Computer science
XML Signature
020206 networking & telecommunications
02 engineering and technology
computer.software_genre
Computer Science Applications
Theoretical Computer Science
Simple API for XML
Artificial Intelligence
Control and Systems Engineering
Feature (computer vision)
ComputingMethodologies_DOCUMENTANDTEXTPROCESSING
0202 electrical engineering, electronic engineering, information engineering
Information system
020201 artificial intelligence & image processing
Data mining
computer
Software
XML
Subjects
Details
- ISSN :
- 00200255
- Volume :
- 326
- Database :
- OpenAIRE
- Journal :
- Information Sciences
- Accession number :
- edsair.doi...........4fcaf651031baa81dea74732131dc663