301. A proposal and evaluation of a coordinated checkpointing technique using incremental snapshots.
- Author
-
Ohara, Mamoru, Arai, Masayuki, Fukumoto, Satoshi, and Iwasaki, Kazuhiko
- Subjects
- *
TELECOMMUNICATION systems , *PHOTOGRAPHS , *COMMUNICATION , *DISTRIBUTION (Probability theory) , *SOCIOLOGY - Abstract
Coordinated checkpointing techniques ensure that a consistent global state is maintained by means of coordination between processes. The approach requires that application messages temporarily cease to be exchanged but the rollback procedure when recovering from a fault is consequently simplified and the recovery costs are small. With current reductions in communications costs, the importance of coordinated techniques may be seen to be growing. However, in large-scale systems there is a possibility that performance will be seriously impaired due to the frequent halting of the exchange of messages. In this paper we propose a method whereby coordination is performed at only a subset of the checkpoint generation points that are periodically visited while at the remaining points each process independently generates an incremental snapshot. This method aims to both alleviate the performance degradation incurred from coordination and to realize relatively high-speed recovery. In evaluating the effectiveness of this method we estimate the checkpointing overheads and recovery costs using a probabilistic model and simulations and compare it with existing coordination methods. The results show that the proposed method is more effective than existing coordination methods from the perspective of both performance and reliability in environments with a relatively low frequency of messages. In addition, we perform comparisons of two different delta schemes for representing the incremental snapshots and discuss which environments they are each respectively suited to. © 2007 Wiley Periodicals, Inc. Electron Comm Jpn Pt 3, 90(8): 39– 53, 2007; Published online in Wiley InterScience (
www.interscience.wiley.com ). DOI 10.1002/ecjc.20296 [ABSTRACT FROM AUTHOR]- Published
- 2007
- Full Text
- View/download PDF