Philippe Clauss, Alain Ketterlin, Compilation pour les Architectures MUlti-coeurS (CAMUS), Laboratoire des Sciences de l'Image, de l'Informatique et de la Télédétection (LSIIT), Centre National de la Recherche Scientifique (CNRS)-Centre National de la Recherche Scientifique (CNRS)-Inria Nancy - Grand Est, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria), Laboratoire de Sciences de l'Image, de l'Informatique et de la Télédétection, équipe ICPS (LSIIT / ICPS), Centre National de la Recherche Scientifique (CNRS), MULTICORE, Institut National de Recherche en Informatique et en Automatique (Inria), and Clauss, Philippe
International audience; This paper describes a tool using one or more executions of a sequential program to detect parallel portions of the program. The tool, called Parwiz, uses dynamic binary instrumentation, targets various forms of parallelism, and suggests distinct parallelization actions, ranging from simple directive tagging to elaborate loop transformations. The first part of the paper details the link between the program's static structures (like routines and loops), the memory accesses performed by the program, and the dependencies that are used to highlight potential parallelism. This part also describes the instrumentation involved, and the general architecture of the system. The second part of the paper puts the framework into action. The first study focuses on loop parallelism, targeting OpenMP parallel- for directives, including privatization when necessary. The second study is an adaptation of a well-known vectorization technique based on a slightly richer dependence description, where the tool suggests an elaborate loop transformation. The third study views loops as a graph of (hopefully lightly) dependent iterations. The third part of the paper explains how the overall cost of data- dependence profiling can be reduced. This cost has two major causes: first, instrumenting memory accesses slows down the program, and second, turning memory accesses into dependence graphs consumes processing time. Parwiz uses static analysis of the original (binary) program to provide data at a coarser level, moving from individual accesses to complete loops whenever possible, thereby reducing the impact of both sources of inefficiency.