1. Worksharing Tasks: An Efficient Way to Exploit Irregular and Fine-Grained Loop Parallelism
- Author
-
Eduard Ayguadé, Kevin Sala, Marcos Maronas, Vicenç Beltran, Sergi Mateo, Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Universitat Politècnica de Catalunya. Doctorat en Arquitectura de Computadors, Barcelona Supercomputing Center, and Universitat Politècnica de Catalunya. CAP - Grup de Computació d'Altes Prestacions
- Subjects
FOS: Computer and information sciences ,Exploit ,Computer science ,010103 numerical & computational mathematics ,02 engineering and technology ,Parallel computing ,01 natural sciences ,Synchronization (computer science) ,0202 electrical engineering, electronic engineering, information engineering ,Overhead (computing) ,Multiprocessors ,Runtime systems ,0101 mathematics ,Informàtica::Arquitectura de computadors::Arquitectures paral·leles [Àrees temàtiques de la UPC] ,Execution model ,020203 distributed computing ,Parallel processing (Electronic computers) ,Processament en paral·lel (Ordinadors) ,Programming models ,Fine grained loop parallelism ,Multiprocessadors ,Structured parallelism ,Task (computing) ,Shared memory ,Computer Science - Distributed, Parallel, and Cluster Computing ,Programming paradigm ,Parallelism (grammar) ,Distributed, Parallel, and Cluster Computing (cs.DC) - Abstract
Shared memory programming models usually provide worksharing and task constructs. The former relies on the efficient fork-join execution model to exploit structured parallelism; while the latter relies on fine-grained synchronization among tasks and a flexible data-flow execution model to exploit dynamic, irregular, and nested parallelism. On applications that show both structured and unstructured parallelism, both worksharing and task constructs can be combined. However, it is difficult to mix both execution models without penalizing the data-flow execution model. Hence, on many applications structured parallelism is also exploited using tasks to leverage the full benefits of a pure data-flow execution model. However, task creation and management might introduce a non-negligible overhead that prevents the efficient exploitation of fine-grained structured parallelism, especially on many-core processors. In this work, we propose worksharing tasks. These are tasks that internally leverage worksharing techniques to exploit fine-grained structured loop-based parallelism. The evaluation shows promising results on several benchmarks and platforms. This work is supported by the Spanish Ministerio de Ciencia, Innovacion y Universidades (TIN2015-65316-P), by the Generalitat de Catalunya (2014-SGR-1051) and by the European Union’s Seventh Framework Programme (FP7/2007-2013) and the H2020 funding framework under grant agreement no. H2020-FETHPC-754304 (DEEP-EST).
- Published
- 2020