Start Over

Using shared-data localization to reduce the cost of inspector-execution in unified-parallel-C programs

Authors :: José Nelson Amaral
Ettore Tiotto
Xavier Martorell
Michail Alvanos
Montse Farreras
Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors
Universitat Politècnica de Catalunya. CAP - Grup de Computació d'Altes Prestacions
Source :: Recercat. Dipósit de la Recerca de Catalunya, instname, UPCommons. Portal del coneixement obert de la UPC, Universitat Politècnica de Catalunya (UPC)
Publication Year :: 2016
Publisher :: Elsevier BV, 2016.
Abstract: We improve performance of fine-grain UPC applications by orders of magnitude.We introduce a novel shared-data localization transformation.We present a thorough performance analysis and evaluation.We show that reducing run-time calls is crucial for performance.We achieve performance comparable to C and MPI using the UPC programming model. Programs written in the Unified Parallel C (UPC) language can access any location of the entire local and remote address space via read/write operations. However, UPC programs that contain fine-grained shared accesses can exhibit performance degradation. One solution is to use the inspector-executor technique to coalesce fine-grained shared accesses to larger remote access operations. A straightforward implementation of the inspector-executor transformation results in excessive instrumentation that hinders performance.This paper addresses this issue and introduces various techniques that aim at reducing the generated instrumentation code: a shared-data localization transformation based on Constant-Stride Linear Memory Descriptors (CSLMADs) S. Aarseth, Gravitational N-Body Simulations: Tools and Algorithms, Cambridge Monographs on Mathematical Physics, Cambridge University Press, 2003., the inlining of data locality checks and the usage of an index vector to aggregate the data. Finally, the paper introduces a lightweight loop code motion transformation to privatize shared scalars that were propagated through the loop body.A performance evaluation, using up to 2048 cores of a POWER 775, explores the impact of each optimization and characterizes the overheads of UPC programs. It also shows that the presented optimizations increase performance of UPC programs up to 1.8 × their UPC hand-optimized counterpart for applications with regular accesses and up to 6.3 × for applications with irregular accesses.

Subjects :: Computer Networks and Communications
Computer science
Parallel programming (Computer science)
Optimizing compiler
02 engineering and technology
Parallel computing
Programació en paral·lel (Informàtica)
Theoretical Computer Science
Artificial Intelligence
Unified Parallel C
0202 electrical engineering, electronic engineering, information engineering
Code (cryptography)
Compiler optimization
Instrumentation (computer programming)
Partitioned global address space
Informàtica::Arquitectura de computadors::Arquitectures paral·leles [Àrees temàtiques de la UPC]
computer.programming_language
Address space
Communication
Locality
Unified parallel C
Computer Graphics and Computer-Aided Design
020202 computer hardware & architecture
Hardware and Architecture
Programming paradigm
020201 artificial intelligence & image processing
computer
Software

Details

ISSN :: 01678191
Volume :: 54
Database :: OpenAIRE
Journal :: Parallel Computing
Accession number :: edsair.doi.dedup.....e2acc2440a3265d300d6c1b54aa1d94c

Tools

Email
Cite

Printer

Authors Abstract Subjects Details

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Using shared-data localization to reduce the cost of inspector-execution in unified-parallel-C programs

Abstract

Subjects

Details

Tools

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Using shared-data localization to reduce the cost of inspector-execution in unified-parallel-C programs

Abstract

Subjects

Details

Tools

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources