Back to Search Start Over

Scalable Mining of Contextual Outliers Using Relevant Subspace.

Authors :
Zhang, Jifu
Yu, Xiaolong
Xun, Yaling
Zhang, Sulan
Qin, Xiao
Source :
IEEE Transactions on Systems, Man & Cybernetics. Systems; Mar2020, Vol. 50 Issue 3, p988-1002, 15p
Publication Year :
2020

Abstract

In this paper, we propose a scalable mining algorithm to discover contextual outliers using relevant subspaces. We develop the mining algorithm using the MapReduce programming model running on a Hadoop cluster. Relevant subspaces, which effectively capture the local distribution of various datasets, are quantified using local sparseness of attribute dimensions. We design a novel way of calculating local outlier factors in a relevant subspace with the probability density of local datasets; this new approach can effectively reflect the outlier degree of a data object that does not satisfy the distribution of the local dataset in the relevant subspace. Attribute dimensions of a relevant subspace, and local outlier factors are expressed as vital contextual information, which improves the interpretability of outliers. Importantly, the selection of ${N}$ data objects with the largest local outlier factor value is categorized as contextual outliers in our solution. To this end, our scalable mining algorithm, which incorporates the locality sensitive hashing distributed strategy, is implemented on a Hadoop cluster. The experimental results validate the effectiveness, interpretability, scalability, and extensibility of the algorithm using both synthetic data and stellar spectral data as experimental datasets. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
21682216
Volume :
50
Issue :
3
Database :
Complementary Index
Journal :
IEEE Transactions on Systems, Man & Cybernetics. Systems
Publication Type :
Academic Journal
Accession number :
141848616
Full Text :
https://doi.org/10.1109/TSMC.2017.2718592