8 results on '"Aninda Manocha"'
Search Results
2. The Implications of Page Size Management on Graph Analytics
- Author
-
Aninda Manocha, Zi Yan, Esin Tureci, Juan Luis Aragon, David Nellans, and Margaret Martonosi
- Published
- 2022
- Full Text
- View/download PDF
3. GraphAttack
- Author
-
Tyler Sorensen, Aninda Manocha, Opeoluwa Matthews, Margaret Martonosi, Juan L. Aragón, and Esin Tureci
- Subjects
Multi-core processor ,Speedup ,Memory hierarchy ,Computer science ,Data parallelism ,Parallel computing ,computer.software_genre ,Software framework ,Hardware and Architecture ,Scalability ,Graph (abstract data type) ,computer ,Queue ,Software ,Information Systems - Abstract
Graph structures are a natural representation of important and pervasive data. While graph applications have significant parallelism, their characteristic pointer indirect loads to neighbor data hinder scalability to large datasets on multicore systems. A scalable and efficient system must tolerate latency while leveraging data parallelism across millions of vertices. Modern Out-of-Order (OoO) cores inherently tolerate a fraction of long latencies, but become clogged when running severely memory-bound applications. Combined with large power/area footprints, this limits their parallel scaling potential and, consequently, the gains that existing software frameworks can achieve. Conversely, accelerator and memory hierarchy designs provide performant hardware specializations, but cannot support diverse application demands. To address these shortcomings, we present GraphAttack, a hardware-software data supply approach that accelerates graph applications on in-order multicore architectures. GraphAttack proposes compiler passes to (1) identify idiomatic long-latency loads and (2) slice programs along these loads into data Producer/ Consumer threads to map onto pairs of parallel cores. Each pair shares a communication queue; the Producer asynchronously issues long-latency loads, whose results are buffered in the queue and used by the Consumer. This scheme drastically increases memory-level parallelism (MLP) to mitigate latency bottlenecks. In equal-area comparisons, GraphAttack outperforms OoO cores, do-all parallelism, prefetching, and prior decoupling approaches, achieving a 2.87× speedup and 8.61× gain in energy efficiency across a range of graph applications. These improvements scale; GraphAttack achieves a 3× speedup over 64 parallel cores. Lastly, it has pragmatic design principles; it enhances in-order architectures that are gaining increasing open-source support.
- Published
- 2021
- Full Text
- View/download PDF
4. Bayesian Optimization for Efficient Accelerator Synthesis
- Author
-
Atefeh Mehrabi, Benjamin C. Lee, Aninda Manocha, and Daniel J. Sorin
- Subjects
Computer science ,Design space exploration ,business.industry ,Bayesian probability ,Bayesian optimization ,Resource (project management) ,Hardware and Architecture ,High-level synthesis ,Embedded system ,Code (cryptography) ,Field-programmable gate array ,business ,Software ,Information Systems ,Electronic circuit - Abstract
Accelerator design is expensive due to the effort required to understand an algorithm and optimize the design. Architects have embraced two technologies to reduce costs. High-level synthesis automatically generates hardware from code. Reconfigurable fabrics instantiate accelerators while avoiding fabrication costs for custom circuits. We further reduce design effort with statistical learning. We build an automated framework, called Prospector, that uses Bayesian techniques to optimize synthesis directives, reducing execution latency and resource usage in field-programmable gate arrays. We show in a certain amount of time that designs discovered by Prospector are closer to Pareto-efficient designs compared to prior approaches. Prospector permits new studies for heterogeneous accelerators.
- Published
- 2020
- Full Text
- View/download PDF
5. AutoSVA: Democratizing Formal Verification of RTL Module Interactions
- Author
-
Margaret Martonosi, David Wentzlaff, Aninda Manocha, and Marcelo Orenes-Vera
- Subjects
FOS: Computer and information sciences ,Programming language ,Computer science ,Liveness ,SystemVerilog ,computer.software_genre ,Learning curve ,Hardware Architecture (cs.AR) ,Effective method ,Electronic design automation ,Control logic ,Computer Science - Hardware Architecture ,Formal verification ,computer ,computer.programming_language - Abstract
Modern SoC design relies on the ability to separately verify IP blocks relative to their own specifications. Formal verification (FV) using SystemVerilog Assertions (SVA) is an effective method to exhaustively verify blocks at unit-level. Unfortunately, FV has a steep learning curve and requires engineering effort that discourages hardware designers from using it during RTL module development. We propose AutoSVA, a framework to automatically generate FV testbenches that verify liveness and safety of control logic involved in module interactions. We demonstrate AutoSVA’s effectiveness and efficiency on deadlock-critical modules of widely-used open-source hardware projects.
- Published
- 2021
- Full Text
- View/download PDF
6. A simulator and compiler framework for agile hardware-software co-design evaluation and exploration
- Author
-
Margaret Martonosi, Juan L. Aragón, Aninda Manocha, Tyler Sorensen, Marcelo Orenes-Vera, and Esin Tureci
- Subjects
010302 applied physics ,business.industry ,Computer science ,02 engineering and technology ,Software prototyping ,Modular design ,computer.software_genre ,01 natural sciences ,020202 computer hardware & architecture ,0103 physical sciences ,0202 electrical engineering, electronic engineering, information engineering ,Programming paradigm ,Compiler ,business ,computer ,Simulation ,Agile software development - Abstract
As Moore's Law has slowed and Dennard Scaling has ended, architects are increasingly turning to heterogeneous parallelism and hardware-software co-design. These trends present new challenges for simulation-based performance assessments that are central to early-stage architectural exploration. Simulators must be lightweight to support heterogeneous combinations of general-purpose cores and specialized processing units. They must also support agile exploration of hardware-software co-design, i.e. changes in the programming model, compiler, ISA, and specialized hardware. To meet these challenges, we describe our compiler and simulator pair: DEC++ and MosaicSim. Together, they provide a lightweight, modular simulator for heterogeneous systems, offering accuracy and agility designed specifically for hardware-software co-design explorations. The simulator and corresponding compiler were developed as part of the DECADES project, a multi-team effort to design and tape out a new heterogeneous architecture. We will present two case-studies in important data-science applications where DEC++ and MosaicSim enable straightforward design space explorations for emerging full-stack systems.
- Published
- 2020
- Full Text
- View/download PDF
7. MosaicSim: A Lightweight, Modular Simulator for Heterogeneous Systems
- Author
-
Juan L. Aragón, Margaret Martonosi, Tyler Sorensen, Luca P. Carloni, Opeoluwa Matthews, Marcelo Orenes-Vera, Tae Jun Ham, Davide Giri, Aninda Manocha, and Esin Tureci
- Subjects
010302 applied physics ,Multi-core processor ,Dennard scaling ,business.industry ,Computer science ,02 engineering and technology ,Modular design ,computer.software_genre ,01 natural sciences ,Modularity ,Toolchain ,020202 computer hardware & architecture ,0103 physical sciences ,0202 electrical engineering, electronic engineering, information engineering ,Programming paradigm ,Compiler ,business ,computer ,Simulation ,Agile software development - Abstract
As Moore's Law has slowed and Dennard Scaling has ended, architects are increasingly turning to heterogeneous parallelism and domain-specific hardware-software co-designs. These trends present new challenges for simulation-based performance assessments that are central to early-stage architectural exploration. Simulators must be lightweight to support rich heterogeneous combinations of general purpose cores and specialized processing units. They must also support agile exploration of hardware-software co-design, i.e. changes in the programming model, compiler, ISA, and specialized hardware. To meet these challenges, we introduce MosaicSim, a lightweight, modular simulator for heterogeneous systems, offering accuracy and agility designed specifically for hardware-software co-design explorations. By integrating the LLVM toolchain, MosaicSim enables efficient modeling of instruction dependencies and flexible additions across the stack. Its modularity also allows the composition and integration of different hardware components. We first demonstrate that MosaicSim captures architectural bottlenecks in applications, and accurately models both scaling trends in a multicore setting and accelerator behavior. We then present two case-studies where MosaicSim enables straightforward design space explorations for emerging systems, i.e. data science application acceleration and heterogeneous parallel architectures.
- Published
- 2020
- Full Text
- View/download PDF
8. Prospector: Synthesizing Efficient Accelerators via Statistical Learning
- Author
-
Aninda Manocha, Daniel J. Sorin, Atefeh Mehrabi, and Benjamin C. Lee
- Subjects
010302 applied physics ,Design space exploration ,Computer science ,Latency (audio) ,02 engineering and technology ,01 natural sciences ,020202 computer hardware & architecture ,Resource (project management) ,Computer architecture ,High-level synthesis ,0103 physical sciences ,0202 electrical engineering, electronic engineering, information engineering ,Code (cryptography) ,Field-programmable gate array ,Electronic circuit - Abstract
Accelerator design is expensive due to the effort required to understand an algorithm and optimize the design. Architects have embraced two technologies to reduce costs. High-level synthesis automatically generates hardware from code. Reconfigurable fabrics instantiate accelerators while avoiding fabrication costs for custom circuits. We further reduce design effort with statistical learning. We build an automated framework, called Prospector, that uses Bayesian techniques to optimize synthesis directives, reducing execution latency and resource usage in field-programmable gate arrays. We show in a certain amount of time designs discovered by Prospector are closer to Pareto-efficient designs compared to prior approaches.
- Published
- 2020
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.