1. High-Level Synthesis of Parallel Specifications Coupling Static and Dynamic Controllers
- Author
-
Vito Giovanni Castellana, Antonino Tumeo, and Fabrizio Ferrandi
- Subjects
Schedule ,Finite-state machine ,Speedup ,Computer science ,Interface (computing) ,Finite element analysis ,Dynamic scheduling ,Parallel computing ,Tools ,Hardware ,Memory management ,Annotations , Memory management , Parallel processing , Tools , Dynamic scheduling , Hardware , Finite element analysis ,Parallel processing (DSP implementation) ,High-level synthesis ,Annotations ,Synchronization (computer science) ,Parallel processing - Abstract
Conventional High-Level Synthesis (HLS) tools exploit parallelism mostly at the Instruction Level (ILP). They statically schedule the input specifications and build centralized Finite State Machine (FSM) controllers. However, aggressive exploitation of ILP in many applications has diminishing returns and, usually, centralized approaches do not efficiently exploit coarser parallelism, because FSMs are inherently serial. In this paper we present an HLS framework able to synthesize applications that, beside ILP, also expose Task Level Parallelism (TLP). An application can expose TLP through annotations that identify the parallel functions (i.e., tasks). To generate accelerators that efficiently execute concurrent tasks, we need to solve several issues: devise a mechanism to support concurrent execution flows, exploit memory parallelism, and manage synchronization. To support concurrent execution flows, we introduce a novel adaptive controller. The adaptive controller is composed of a set of interacting control elements that independently manage the execution of a task. These control elements check dependencies and resource constraints at runtime, enabling as soon as possible execution. To support parallel access to shared memories and synchronization, we integrate with a novel Hierarchical Memory Interface (HMI). With respect to previous solutions, the proposed interface supports multi-ported memories and atomic memory operations, which commonly occur in parallel programming. Our framework can generate the hardware implementation of C functions by employing two different approaches, depending on its characteristics. If a function exposes TLP, then the framework generates hardware implementations based on the adaptive controller. Otherwise, the framework implements the function through the FSM approach, which is optimized for ILP exploitation. We evaluate our framework on a set of parallel applications and show substantial performance improvements (average speedup of 4.7) with limited area overheads (average area increase of 5.48 times)
- Published
- 2021
- Full Text
- View/download PDF