Back to Search
Start Over
Self-healing of workflow activity incidents on distributed computing infrastructures
- Source :
- Future Generation Computer Systems, ICT FP7 Publications Database, OpenAIRE, Future Generation Computer Systems, 2013, 29 (8), pp.2284-2294. ⟨10.1016/j.future.2013.06.012⟩, Future Generation Computer Systems, Elsevier, 2013, 29 (8), pp.2284-2294. ⟨10.1016/j.future.2013.06.012⟩
- Publication Year :
- 2013
- Publisher :
- ELSEVIER SCIENCE BV, 2013.
-
Abstract
- International audience; Distributed computing infrastructures are commonly used through scientific gateways, but operating these gateways requires important human intervention to handle operational incidents. This paper presents a self-healing process that quantifies incident degrees of workflow activities from metrics measuring long-tail effect, application efficiency, data transfer issues, and site-specific problems. These metrics are simple enough to be computed online and they make little assumptions on the application or resource characteristics. From their degree, incidents are classified in levels and associated to sets of healing actions that are selected based on association rules modeling correlations between incident levels. We specifically study the long-tail effect issue, and propose a new algorithm to control task replication. The healing process is parametrized on real application traces acquired in production on the European Grid Infrastructure. Experimental results obtained in the Virtual Imaging Platform show that the proposed method speeds up execution up to a factor of 4, consumes up to 26% less resource time than a control execution and properly detects unrecoverable errors.
- Subjects :
- 020203 distributed computing
Association rule learning
Computer Networks and Communications
Computer science
Distributed computing
Real-time computing
Process (computing)
02 engineering and technology
Grid
Replication (computing)
Task (computing)
Workflow
Resource (project management)
Hardware and Architecture
0202 electrical engineering, electronic engineering, information engineering
020201 artificial intelligence & image processing
[SPI.SIGNAL]Engineering Sciences [physics]/Signal and Image processing
Software
Workflow management system
Subjects
Details
- Language :
- English
- ISSN :
- 0167739X
- Database :
- OpenAIRE
- Journal :
- FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF GRID COMPUTING AND ESCIENCE
- Accession number :
- edsair.doi.dedup.....637c8398ca6e24a8145ff17145c5de1b
- Full Text :
- https://doi.org/10.1016/j.future.2013.06.012