351. Manual and Compiler Assisted Methods For Generating Fault-Tolerant Parallel Programs
- Author
-
Prithviraj Banerjee and Amber Roy-Chowdhury
- Subjects
Computer science ,Dataflow ,Programming language ,Parallel computing ,computer.software_genre ,Checksum ,Distributed memory ,Compiler ,Affine transformation ,computer ,High Performance Fortran ,computer.programming_language ,Compile time ,Intel Paragon - Abstract
We have developed an automated, compile time approach to generating error-detecting parallel programs. The compiler is used to identify statements implementing affine transformations within the program and automatically insert code for computing, manipulating, and comparing checksums in order to check the correctness of the code implementing affine transformations. Statements which do not implement affine transformations are checked by duplication. Checksums are reused from one loop to the next if this is possible, rather than recomputing checksums for every statement. A global dataflow analysis is performed in order to determine points at which checksums need to be recomputed. We also use a novel method of specifying the data distributions of the check data using directives provided by the High Performance Fortran (HPF) standard so that the computations on the original data and the corresponding check computations are performed on different processors. Results are presented on an Intel Paragon distributed memory multicomputer.
- Published
- 1995