Back to Search Start Over

Distributed Anomaly Detection Over Big Data

Authors :
Walid Atwa
Arabi Keshk
Mohamed Sakr
Source :
Research Journal of Applied Sciences, Engineering and Technology. 16:77-87
Publication Year :
2019
Publisher :
Maxwell Scientific Publication Corp., 2019.

Abstract

This study aims to solve the problem of detecting anomalies in big data. A border-based Gird Partition (BGP) algorithm was proposed. The BGP algorithm focuses on calculating the Local Outlier Factor (LOF) for big data in a distributed environment. It splits the data into intersected subsets, then allocates these subsets to the slave nodes in a distributed environment. Some parts of these subsets are replicated between slave nodes. The slave nodes calculate the LOF for each subset that it owns. The splitting of the data between the slave nodes is done in grid-based without considering the size of the data that will be assigned to every slave node. The BGP algorithm results in un-balanced distribution of the subsets between slave nodes. To overcome this problem a modification on the BGP algorithm is proposed to take in consideration the size of the data that will be assigned to every slave node. The modified algorithm called Balanced boarder-based Gird Partition algorithm (BBGP). BBGP splits the data between the slave node equally. So that all the slave nodes will do balanced processing for calculating the LOF for the data. In the end, we evaluate the performance of the two algorithms through a series of simulation experiments over real data sets.

Details

ISSN :
20407467 and 20407459
Volume :
16
Database :
OpenAIRE
Journal :
Research Journal of Applied Sciences, Engineering and Technology
Accession number :
edsair.doi...........1f63f7a0e424c51f6b8ebf7d9ff54e23
Full Text :
https://doi.org/10.19026/rjaset.16.6003