Back to Search Start Over

SPSRG: a prediction approach for correlated failures in distributed computing systems.

Authors :
Zheng, Weiwei
Wang, Zhili
Huang, Haoqiu
Meng, Luoming
Qiu, Xuesong
Source :
Cluster Computing. Dec2016, Vol. 19 Issue 4, p1703-1721. 19p.
Publication Year :
2016

Abstract

Failure instances in distributed computing systems (DCSs) have exhibited temporal and spatial correlations, where a single failure instance can trigger a set of failure instances simultaneously or successively within a short time interval. In this work, we propose a correlated failure prediction approach (CFPA) to predict correlated failures of computing elements in DCSs. The approach models correlated-failure patterns using the concept of probabilistic shared risk groups and makes a prediction for correlated failures by exploiting an association rule mining approach in a parallel way. We conduct extensive experiments to evaluate the feasibility and effectiveness of CFPA using both failure traces from Los Alamos National Lab and simulated datasets. The experimental results show that the proposed approach outperforms other approaches in both the failure prediction performance and the execution time, and can potentially provide better prediction performance in a larger system. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
13867857
Volume :
19
Issue :
4
Database :
Academic Search Index
Journal :
Cluster Computing
Publication Type :
Academic Journal
Accession number :
119755006
Full Text :
https://doi.org/10.1007/s10586-016-0633-2