1. Anthropomorphic diagnosis of runtime hidden behaviors in OpenMP multi-threaded applications.
- Author
-
Wang, Weidong, Li, Dian, Luo, Wangda, Kang, Yujian, and Wang, Liqiang
- Subjects
- *
HIGH performance computing , *DIAGNOSIS - Abstract
Extreme-scale computing involves hundreds of millions of threads with multi-level parallelism running on large-scale hierarchical and heterogeneous hardware. Some OpenMP multi-threaded applications increasingly suffer from runtime hidden behaviors owning to shared resource contention as well as software- and hardware-related problems. Such hidden behaviors can result in failure and inefficiencies and are among the main challenges in system resiliency. To minimize the impact of hidden behaviors, one must quickly and accurately detect and diagnose the hidden behaviors that cause the failures. However, it is difficult to identify hidden behaviors in the dynamic and noisy data collected by OpenMP multi-threaded monitoring infrastructures. This paper presents an anthropomorphic diagnosis framework for hidden behaviors of OpenMP multi-threaded applications. In the framework, we first design injected heartbeat functions for OpenMP multi-threaded applications. Then, we leverage the heartbeat sequences to extract features of hidden behaviors. Finally, we develop a feature learning-based algorithm using heartbeat analysis, namely HSA, to diagnose hidden behaviors. To evaluate our framework, the NAS Parallel NPB benchmark, EPCC OpenMP micro-benchmark suite, and Jacobi benchmark are used to test the performance of our proposed framework. The experimental results demonstrate that our framework successfully identifies 90.3% of the injected hidden behaviors of OpenMP multi-threaded applications while acquiring low overhead. • The anthropomorphic framework is proposed for diagnosing hidden behaviors. • The framework based on heartbeats supports multi-threaded OpenMP applications. • Small-size samples also demonstrate the high efficiency of this framework. • The framework is lower overhead and is more accurate than the state-of-the-art. • The framework supports extension and specificity in most of our experiments. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF