1. Enhancing OpenMP tasking model: performance and portability
- Author
-
Sara Royuela, Chenle Yu, Eduardo Quinones, Universitat Politècnica de Catalunya. Doctorat en Arquitectura de Computadors, and Barcelona Supercomputing Center
- Subjects
Tasking model ,Computer science ,Parallel programming (Computer science) ,020206 networking & telecommunications ,02 engineering and technology ,Parallel computing ,Multiprocessadors ,Programació en paral·lel (Informàtica) ,Supercomputers ,OpenMP specification ,020202 computer hardware & architecture ,Software portability ,Supercomputadors ,Symmetric multiprocessing ,0202 electrical engineering, electronic engineering, information engineering ,Programming paradigm ,Parallelism (grammar) ,Overhead (computing) ,Multiprocessors ,Orchestration (computing) ,Informàtica::Arquitectura de computadors::Arquitectures paral·leles [Àrees temàtiques de la UPC] ,Runtime overhead ,Scaling - Abstract
OpenMP, as the de-facto standard programming model in symmetric multiprocessing for HPC, has seen its performance boosted continuously by the community, either through implementation enhancements or specification augmentations. Furthermore, the language has evolved from a prescriptive nature, as defined by the thread-centric model, to a descriptive behavior, as defined by the task-centric model. However, the overhead related to the orchestration of tasks is still relatively high. Applications exploiting very fine-grained parallelism and systems with a large number of cores available might fail on scaling. In this work, we propose to include the concept of Task Dependency Graph (TDG) in the specification by introducing a new clause, named taskgraph, attached to task or target directives. By design, the TDG allows alleviating the overhead associated with the OpenMP tasking model, and it also facilitates linking OpenMP with other programming models that support task parallelism. According to our experiments, a GCC implementation of the taskgraph is able to significantly reduce the execution time of fine-grained task applications and increase their scalability with regard to the number of threads. This work has been supported by the EU H2020 project AMPERE under the grant agreement no. 871669.
- Published
- 2021