Back to Search
Start Over
Resilient computational applications using Coarray Fortran
- Source :
- Parallel Computing. 81:58-67
- Publication Year :
- 2019
- Publisher :
- Elsevier BV, 2019.
-
Abstract
- With the increase in the number of hardware components and layers of the software stack in High Performance Computing (HPC) there will likely be an increment in number of hardware and software failures, which will be user-visible. Even under the most optimistic assumptions about the individual components reliability, probabilistic amplification from using millions of nodes has a dramatic impact on the Mean Time Between Failure (MTBF) of the entire platform. Although several techniques to address this problem have been developed, the support provided by the programming model, for the user to mitigate or work around this issue, is still insufficient. The Fortran 2018 standard defines failed images, a new feature that allows the programmer to detect and manage image failures in a parallel program. In this paper we show how to use failed images and teams, another feature defined in the Fortran 2018 standard, to implement resilient computational applications.
- Subjects :
- Computer Networks and Communications
business.industry
Fortran
Computer science
Probabilistic logic
010103 numerical & computational mathematics
Supercomputer
01 natural sciences
Computer Graphics and Computer-Aided Design
Theoretical Computer Science
010101 applied mathematics
Software
Computer engineering
Artificial Intelligence
Hardware and Architecture
Programming paradigm
0101 mathematics
Programmer
business
Coarray Fortran
computer
computer.programming_language
Subjects
Details
- ISSN :
- 01678191
- Volume :
- 81
- Database :
- OpenAIRE
- Journal :
- Parallel Computing
- Accession number :
- edsair.doi...........e5385af73f12bbc0e3210fc09fbe3094