Author: "Benini, Luca" / Publisher: association for computing machinery - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Benini, Luca"' showing total 247 results

Start Over Author "Benini, Luca" Publisher association for computing machinery

247 results on '"Benini, Luca"'

1. DNN Is Not All You Need: Parallelizing Non-neural ML Algorithms on Ultra-low-power IoT Processors.

Author: TABANELLI, ENRICO, TAGLIAVINI, GIUSEPPE, and BENINI, LUCA
Subjects: MACHINE learning, ARTIFICIAL neural networks, EMULATION software, INTERNET of things, PARALLEL algorithms, PARALLEL processing
Abstract: Machine Learning (ML) functions are becoming ubiquitous in latency- and privacy-sensitive IoT applications, prompting a shift toward near-sensor processing at the extreme edge and the consequent increasing adoption of Parallel Ultra-low-power (PULP) IoT processors. These compute- and memory-constrained parallel architectures need to run efficiently a wide range of algorithms, including key Non-neural ML kernels that compete favorably with Deep Neural Networks in terms of accuracy under severe resource constraints. In this article, we focus on enabling efficient parallel execution of Non-neural ML algorithms on two RISCV based PULP platforms, namely, GAP8, a commercial chip, and PULP-OPEN, a research platform running on an FPGA emulator. We optimized the parallel algorithms through a fine-grained analysis and intensive optimization to maximize the speedup, considering two alternative Floating-point (FP) emulation libraries on GAP8 and the native FPU support on PULP-OPEN. Experimental results show that a target-optimized emulation library can lead to an average 1.61× runtime improvement and 37% energy reduction compared to a standard emulation library, while the native FPU support reaches up to 32.09× and 99%, respectively. In terms of parallel speedup, our design improves the sequential execution by 7.04× on average on the targeted octa-core platforms leading to energy and latency decrease up to 87%. Last, we present a comparison with the ARM Cortex-M4 microcontroller, a widely adopted commercial solution for edge deployments, which is 12.87× slower than PULP-OPEN. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

2. Dataflow Driven Partitioning of Machine Learning Applications for Optimal Energy Use in Batteryless Systems.

Author: GOMEZ, ANDRES, TRETTER, REAS, HAGER, PASCAL ALEXANDER, SANMUGARAJAH, PRAVEENTH, BENINI, LUCA, and THIELE, LOTHAR
Subjects: ENERGY consumption, ENERGY storage, ENERGY harvesting, RELIABILITY in engineering, INTERNET of things, MACHINE learning
Abstract: Sensing systems powered by energy harvesting have traditionally been designed to tolerate long periods without energy. As the Internet of Things (IoT) evolves toward a more transient and opportunistic execution paradigm, reducing energy storage costs will be key for its economic and ecologic viability. However, decreasing energy storage in harvesting systems introduces reliability issues. Transducers only produce intermittent energy at low voltage and current levels, making guaranteed task completion a challenge. Existing ad hoc methods overcome this by buffering enough energy either for single tasks, incurring large data-retention overheads, or for one full application cycle, requiring a large energy buffer. We present Julienning: an auto)mated method for optimizing the total energy cost of batteryless applications. Using a custom specification model, developers can describe transient applications as a set of atomically executed kernels with explicit data dependencies. Our optimization flow can partition data- and energy-intensive applications into multiple execution cycles with bounded energy consumption. By leveraging interkernel data dependencies, these energy-bounded execution cycles minimize the number of system activations and nonvolatile data transfers, and thus the total energy overhead. We validate our methodology with two batteryless cameras running energy-intensive machine learning applications. Using a solar testbed, we replay real-world illuminance traces to experimentally demonstrate optimized batteryless execution with a transducer-to-application energy efficiency of 74.5%. Partitioning results demonstrate that compared to ad hoc solutions, our method can reduce the required energy storage by over 94% while only incurring a 0.12% energy overhead. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

3. A Construction Kit for Efficient Low Power Neural Network Accelerator Designs.

Author: JOKIC, PETAR, AZARKHISH, ERFAN, BONETTI, ANDREA, PONS, MARC, EMERY, STEPHANE, and BENINI, LUCA
Subjects: MATHEMATICAL optimization
Abstract: Implementing embedded neural network processing at the edge requires efficient hardware acceleration that combines high computational throughput with low power consumption. Driven by the rapid evolution of network architectures and their algorithmic features, accelerator designs are constantly being adapted to support the improved functionalities. Hardware designers can refer to a myriad of accelerator implementations in the literature to evaluate and compare hardware design choices. However, the sheer number of publications and their diverse optimization directions hinder an effective assessment. Existing surveys provide an overview of these works but are often limited to system-level and benchmark-specific performance metrics, making it difficult to quantitatively compare the individual effects of each utilized optimization technique. This complicates the evaluation of optimizations for new accelerator designs, slowing-down the research progress. In contrast to previous surveys, this work provides a quantitative overview of neural network accelerator optimization approaches that have been used in recent works and reports their individual effects on edge processing performance. The list of optimizations and their quantitative effects are presented as a construction kit, allowing to assess the design choices for each building block individually. Reported optimizations range from up to 10,000× memory savings to 33× energy reductions, providing chip designers with an overview of design choices for implementing efficient low power neural network accelerators. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

4. The Predictable Execution Model in Practice: Compiling Real Applications for COTS Hardware.

Author: FORSBERG, BJÖRN, SOLIERI, MARCO, BERTOGNA, MARKO, BENINI, LUCA, and MARONGIU, ANDREA
Subjects: RANDOM access memory, PROBLEM solving, COMPILERS (Computer programs)
Abstract: Adoption of multi- and many-core processors in real-time systems has so far been slowed down, if not totally barred, due do the difficulty in providing analytical real-time guarantees on worst-case execution times. The Predictable Execution Model (PREM) has been proposed to solve this problem, but its practical support requires significant code refactoring, a task better suited for a compilation tool chain than human programmers. Implementing a PREM compiler presents significant challenges to conform to PREM requirements, such as guaranteed upper bounds on memory footprint and the generation of efficient schedulable non-preemptive regions. This article presents a comprehensive description on how a PREM compiler can be implemented, based on several years of experience from the community. We provide accumulated insights on how to best balance conformance to real-time requirements and performance and present novel techniques that extend the applicability from simple benchmark suites to real-world applications. We show that code transformed by the PREM compiler enables timing predictable execution on modern commercial off-the-shelf hardware, providing novel insights on how PREM can protect 99.4% of memory accesses on random replacement policy caches at only 16% performance loss on benchmarks from the PolyBench benchmark suite. Finally, we show that the requirements imposed on the programming model are well-aligned with current coding guidelines for timing critical software, promoting easy adoption. [ABSTRACT FROM AUTHOR]
Published: 2021
Full Text: View/download PDF

5. Extending the RISC-V ISA for Efficient RNN-based 5G Radio Resource Management.

Author: Andri, Renzo, Henriksson, Tomas, and Benini, Luca
Subjects: RECURRENT neural networks, 5G networks, RADIO resource management, COMPUTER software, COMPUTER simulation
Abstract: Radio Resource Management in 5G mobile communication is a challenging problem for which Recurrent Neural Networks (RNN) have shown promising results. Accelerating the compute-intensive RNN inference is therefore of utmost importance. Programmable solutions are desirable for effective 5G-RRM coping with the rapidly evolving landscape of RNN variations. In this paper, we investigate RNN inference acceleration by tuning both the instruction set and micro-architecture of a micro-controller-class open-source RISC-V core. We couple HW extensions with software optimizations to achieve an overall improvement in throughput and energy efficiency of 15x and 10x w.r.t. the baseline core on a wide range of RNNs used in various RRM tasks. [ABSTRACT FROM AUTHOR]
Published: 2020

6. ATUNs: Modular and Scalable Support for Atomic Operations in a Shared Memory Multiprocessor.

Author: Kurth, Andreas, Riedel, Samuel, Zaruba, Florian, Hoefler, Torsten, and Benini, Luca
Subjects: MULTIPROCESSORS, COMPUTER algorithms, ELECTRIC controllers, BANDWIDTHS, DATA analysis
Abstract: Atomic operations are crucial for most modern parallel and concurrent algorithms, which necessitates their optimized implementation in highly-scalable manycore processors. We propose a modular and efficient, open-source ATomic UNit (ATUN) architecture that can be placed flexibly at different levels of the memory hierarchy. ATUN demonstrates near-optimal linear scaling for various synthetic and real-world workloads on an FPGA prototype with 32 RISC-V cores. We characterize the hardware complexity of our ATUN design in 22 nm FDSOI and find that it scales linearly in area (only 0.5 kGE per core) and logarithmically in the critical path. [ABSTRACT FROM AUTHOR]
Published: 2020

7. Paving the Way Toward Energy-Aware and Automated Datacentre.

Author: Bartolini, Andrea, Beneventi, Francesco, Borghesi, Andrea, Cesarini, Daniele, Libri, Antonio, Benini, Luca, and Cavazzoni, Carlo
Published: 2019
Full Text: View/download PDF

8. HERO.

Author: Kurth, Andreas, Capotondi, Alessandro, Vogel, Pirmin, Benini, Luca, and Marongiu, Andrea
Published: 2018
Full Text: View/download PDF

9. COUNTDOWN.

Author: Cesarini, Daniele, Bartolini, Andrea, Bonfà, Piero, Cavazzoni, Carlo, and Benini, Luca
Published: 2018
Full Text: View/download PDF

10. Evaluation of NTP/PTP fine-grain synchronization performance in HPC clusters.

Author: Libri, Antonio, Bartolini, Andrea, Cesarini, Daniele, and Benini, Luca
Published: 2018
Full Text: View/download PDF

11. PULP-HD: Accelerating Brain-Inspired High-Dimensional Computing on a Parallel Ultra-Low Power Platform.

Author: Montagna, Fabio, Rahimi, Abbas, Benatti, Simone, Rossi, Davide, and Benini, Luca
Subjects: COMPUTING platforms, ARITHMETIC, DECODING algorithms, MULTIVARIATE analysis, TELECOMMUNICATION equipment
Abstract: Computing with high-dimensional (HD) vectors, also referred to as hypervectors, is a brain-inspired alternative to computing with scalars. Key properties of HD computing include a well-defined set of arithmetic operations on hypervectors, generality, scalability, robustness, fast learning, and ubiquitous parallel operations. HD computing is about manipulating and comparing large patterns--binary hypervectors with 10,000 dimensions--making its efficient realization on minimalistic ultra-low-power platforms challenging. This paper describes HD computing's acceleration and its optimization of memory accesses and operations on a silicon prototype of the PULPv3 4-core platform (1.5mm², 2 mW), surpassing the state-of-the-art classification accuracy (on average 92.4%) with simultaneous 3.7x end-to-end speed-up and 2x energy saving compared to its single-core execution. We further explore the scalability of our accelerator by increasing the number of inputs and classification window on a new generation of the PULP architecture featuring bit-manipulation instruction extensions and larger number of 8 cores. These together enable a near ideal speed-up of 18.4x compared to the single-core PULPv3. [ABSTRACT FROM AUTHOR]
Published: 2018
Full Text: View/download PDF

12. Multi-core data analytics SoC with a flexible 1.76 Gbit/s AES-XTS cryptographic accelerator in 65 nm CMOS.

Author: Gürkaynak, Frank K., Schilling, Robert, Muehlberghuber, Michael, Conti, Francesco, Mangard, Stefan, and Benini, Luca
Published: 2017
Full Text: View/download PDF

13. An Ultra-Low Power Address-Event Sensor Interface for Energy-Proportional Time-to-Information Extraction.

Author: Di Mauro, Alofio, Conti, Francesco, and Benini, Luca
Subjects: DETECTORS, MICROCONTROLLERS, ELECTRONIC controllers, MICROPROCESSORS, CONSUMPTION (Economics)
Abstract: Internet-of-Thiings devices need sensors with low power footprint and capable of producing semantically rich data. Promising candidates are spiking sensors that use asynchronous Address-Event Representation (AER) carrying information within inter-spike times. To minimize the overhead of coupling AER sensors with off-the-shelf microcontrollers, we propose an FPGA-based methodology that i) tags the AER spikes with timestamps to make them carriable by standard interfaces (e.g. I2S, SPI); ii) uses a recursively divided clock generated on-chip by a pausable ring-oscillator, to reduce power while keeping accuracy above 97% on timestamps. We prototyped our methodology on a IGLOOnano AGLN250 FPGA, consuming less than 4.5mW under a 550kevt/s spike rate (i.e. a noisy environment), and down to 50uW in absence of spikes. [ABSTRACT FROM AUTHOR]
Published: 2017
Full Text: View/download PDF

14. Lightweight IO virtualization on MPU enabled microcontrollers.

Author: Paci, Francesco, Brunelli, Davide, and Benini, Luca
Published: 2018
Full Text: View/download PDF

15. Energy-efficient GPGPU Architectures via Collaborative Compilation and Memristive Memory-based Computing

Author: Rahimi, Abbas Farrokh, Ghofrani, Amirali, Angel, Miguel, Cheng, Tim Kwang-Ting, Benini, Luca, Gupta, Rajesh Kumar C., Rahimi, Abbas Farrokh, Ghofrani, Amirali, Angel, Miguel, Cheng, Tim Kwang-Ting, Benini, Luca, and Gupta, Rajesh Kumar C.
Abstract: Thousands of deep and wide pipelines working concurrently make GPGPU high power consuming parts. Energy-effciency techniques employ voltage overscaling that increases timing sensitivity to variations and hence aggravating the energy use issues. This paper proposes a method to increase spatiotemporal reuse of computational effort by a combination of compilation and micro-architectural design. An associative memristive memory (AMM) module is integrated with the oating point units (FPUs). Together, we enable negrained partitioning of values and nd high-frequency sets of values for the FPUs by searching the space of possible inputs, with the help of application-specic prole feedback. For every kernel execution, the compiler pre-stores these high-frequent sets of values in AMM modules { representing partial functionality of the associated FPU that are concurrently evaluated over two clock cycles. Our simulation results show high hit rates with 32-entry AMM modules that enable 36% reduction in average energy use by the kernel codes. Compared to voltage overscaling, this technique enhances robustness against timing errors with 39% average energy saving.
Published: 2014

16. Efficient Virtual Memory Sharing via On-Accelerator Page TableWalking in Heterogeneous Embedded SoCs.

Author: VOGEL, PIRMIN, KURTH, ANDREAS, WEINBUCH, JOHANNES, MARONGIU, ANDREA, and BENINI, LUCA
Subjects: HETEROGENEOUS computing, SYSTEMS on a chip, VIRTUAL storage (Computer science), EMBEDDED computer systems, COMPUTER software
Abstract: Shared virtual memory is key in heterogeneous systems on chip (SoCs) that combine a general-purpose host processor with a many-core accelerator, both for programmability and performance. In contrast to the fullblown, hardware-only solutions predominant in modern high-end systems, lightweight hardware-software co-designs are better suited in the context of more power- and area-constrained embedded systems and provide additional benefits in terms of flexibility and predictability. As a downside, the latter solutions require the host to handle in software synchronization in case of page misses as well as miss handling. This may incur considerable run-time overheads. In this work, we present a novel hardware-software virtual memory management approach for many-core accelerators in heterogeneous embedded SoCs. It exploits an accelerator-side helper thread concept that enables the accelerator tomanage its virtualmemory hardware autonomouslywhile operating cache-coherently on the page tables of the user-space processes of the host. This greatly reduces overhead with respect to host-side solutions while retaining flexibility. We have validated the design with a set of parameterizable benchmarks and real-world applications covering various application domains. For purely memory-bound kernels, the accelerator performance improves by a factor of 3.8 compared with host-based management and lies within 50% of a lower-bound ideal memory management unit. [ABSTRACT FROM AUTHOR]
Published: 2017
Full Text: View/download PDF

17. Efficient Virtual Memory Sharing via On-Accelerator Page Table Walking in Heterogeneous Embedded SoCs.

Author: VOGEL, PIRMIN, KURTH, ANDREAS, WEINBUCH, JOHANNES, MARONGIU, ANDREA, and BENINI, LUCA
Subjects: VIRTUAL storage (Computer science), SYSTEMS on a chip, COMPUTER input-output equipment, EMBEDDED computer systems, COMPUTER software
Abstract: Shared virtual memory is key in heterogeneous systems on chip (SoCs) that combine a general-purpose host processor with a many-core accelerator, both for programmability and performance. In contrast to the fullblown, hardware-only solutions predominant in modern high-end systems, lightweight hardware-software co-designs are better suited in the context of more power- and area-constrained embedded systems and provide additional benefits in terms of flexibility and predictability. As a downside, the latter solutions require the host to handle in software synchronization in case of page misses as well as miss handling. This may incur considerable run-time overheads. In this work, we present a novel hardware-software virtual memory management approach for many-core accelerators in heterogeneous embedded SoCs. It exploits an accelerator-side helper thread concept that enables the accelerator tomanage its virtualmemory hardware autonomouslywhile operating cache-coherently on the page tables of the user-space processes of the host. This greatly reduces overhead with respect to host-side solutions while retaining flexibility. We have validated the design with a set of parameterizable benchmarks and real-world applications covering various application domains. For purely memory-bound kernels, the accelerator performance improves by a factor of 3.8 compared with host-based management and lies within 50% of a lower-bound ideal memory management unit. [ABSTRACT FROM AUTHOR]
Published: 2017
Full Text: View/download PDF

18. An Evaluation of Memory Sharing Performance for Heterogeneous Embedded SoCs with Many-Core Accelerators.

Author: Vogel, Pirmin, Marongiu, Andrea, and Benini, Luca
Published: 2015
Full Text: View/download PDF

19. Runtime Support for Multiple Offload-Based Programming Models on Embedded Manycore Accelerators.

Author: Capotondi, Alessandro, Haugou, Germain, Marongiu, Andrea, and Benini, Luca
Published: 2015
Full Text: View/download PDF

20. Lightweight Virtual Memory Support for Many-Core Accelerators in Heterogeneous Embedded SoCs.

Author: Vogel, Pirmin, Marongiu, Andrea, and Benini, Luca
Published: 2015

21. VirtualSoC: A Research Tool for Modern MPSoCs.

Author: BORTOLOTTI, DANIELE, MARONGIU, ANDREA, and BENINI, LUCA
Subjects: COMPUTER system design & construction, HIGH performance computing, EMBEDDED computer systems, COMPUTER simulation, COMPUTER operating systems, CENTRAL processing units
Abstract: Architectural heterogeneity has proven to be an effective design paradigm to cope with an ever-increasing demand for computational power within tight energy budgets, in virtually every computing domain. Programmable manycore accelerators are currently widely used not only in high-performance computing systems, but also in embedded devices, in which they operate as coprocessors under the control of a general-purpose CPU (the host processor). Clearly, such powerful hardware architectures are paired with sophisticated and complex software ecosystems, composed of operating systems, programming models plus associated runtime engines, and increasingly complex user applications with related libraries. System modeling has always played a key role in early architectural exploration or software development when the real hardware is not available. The necessity of efficiently coping with the huge HW/SW design space provided by the described heterogeneous Systems on Chip (SoCs) calls for advanced full-system simulation methodologies and tools, capable of assessing variousmetrics for the functional and nonfunctional properties of the target system. In this article, we describe VirtualSoC, a simulation tool targeting the full-system simulation of massively parallel heterogeneous SoCs. We also describe how VirtualSoC has been successfully adopted in several research projects. [ABSTRACT FROM AUTHOR]
Published: 2016
Full Text: View/download PDF

22. Quantifying the impact of variability on the energy efficiency for a next-generation ultra-green supercomputer.

Author: Fraternali, Francesco, Bartolini, Andrea, Cavazzoni, Carlo, Tecchiolli, Giampietro, and Benini, Luca
Published: 2014
Full Text: View/download PDF

23. Approximate compressed sensing.

Author: Bortolotti, Daniele, Mamaghanian, Hossein, Bartolini, Andrea, Ashouei, Maryam, Stuijt, Jan, Atienza, David, Vandergheynst, Pierre, and Benini, Luca
Published: 2014
Full Text: View/download PDF

24. A Virtualization Framework for IOMMU-less Many-Core Accelerators.

Author: Pinto, Christian, Marongiu, Andrea, and Benini, Luca
Published: 2014
Full Text: View/download PDF

25. Energy-Efficient GPGPU Architectures via Collaborative Compilation and Memristive Memory-Based Computing.

Author: Rahimi, Abbas, Ghofrani, Amirali, Lastras-Montano, Miguel Angel, Cheng, Kwang-Ting, Benini, Luca, and Gupta, Rajesh K.
Published: 2014
Full Text: View/download PDF

26. An Approximate Computing Technique for Reducing the Complexity of a Direct-Solver for Sparse Linear Systems in Real-Time Video Processing.

Author: Schaffner, Michael, Gürkaynak, Frank K., Smolic, Aljosa, Kaeslin, Hubert, and Benini, Luca
Published: 2014
Full Text: View/download PDF

27. Guaranteed Computational Resprinting via Model-Predictive Control.

Author: TILLI, ANDREA, BARTOLINI, ANDREA, CACCIARI, MATTEO, and BENINI, LUCA
Subjects: MATHEMATICAL models, HEAT capacity, SILICON, PHASE change materials, STEADY state conduction
Abstract: Today and future many-core systems are facing the utilization wall and dark silicon problems, for which not all the processing engines can be powered at the same time as this will lead to a power consumption higher than the Total Design Power (TDP) budget. Recently, computational sprinting approaches addressed the problem by exploiting the intrinsic thermal capacitance of the chip and the properties of common applications, which require intense, but temporary, use of resources. The thermal capacitance, possibly augmented with phase change materials, enables the temporary activation of all the resources simultaneously, although they largely exceed the steady-state thermal design power. In this article, we present an innovative and lowoverhead hierarchical model-predictive controller for managing thermally safe sprinting with predictable resprinting rate, which ensures the correct execution of mixed-criticality tasks. Well-targeted simulations, also based on real workload benchmarks, show the applicability and the effectiveness of our solution. [ABSTRACT FROM AUTHOR]
Published: 2015
Full Text: View/download PDF

28. Workload and User Experience-Aware Dynamic Reliability Management in Multicore Processors.

Author: Mercati, Pietro, Bartolini, Andrea, Paterna, Francesco, Rosing, Tajana Simunic, and Benini, Luca
Subjects: MULTICORE processors, WORKLOAD of computer networks, COMPUTER users, SOFTWARE reliability, CMOS integrated circuits
Abstract: Reliability is a major concern for nanoscale CMOS circuits. Degradation phenomena such as Electromigration, Negative Bias Temperature Instability, Time Dependent Dielectric Breakdown worsen with transistor scaling. Dynamic Reliability Management (DRM) techniques reduce reliability loss at runtime by constraining operating points, but they face the challenge of reducing user experience degradation while meeting a lifetime target. In this work we propose a sensor based hierarchical controller for multicore processor DRM, exploiting the major gap between the time scales of workload variations and reliability loss. We improve performance and user experience by locally relaxing reliability-induced operating point constraints, while meeting them over the large time windows relevant for reliability. With respect to the state-of-the-art, our solution guarantees timely execution of 100% of latency-critical applications, and have a 4% performance improvement over the whole lifetime. [ABSTRACT FROM AUTHOR]
Published: 2013

29. Aging-Aware Compiler-Directed VLIW Assignment for GPGPU Architectures.

Author: Rahimi, Abbas, Benini, Luca, and Gupta, Rajesh K.
Subjects: GRAPHICS processing units, ACCESSIBLE design, COMPUTER architecture, ELECTRIC faults, STATICS
Abstract: Negative bias temperature instability (NBTI) adversely affects the reliability of a processor by introducing new delay-induced faults. However, the effect of these delay variations is not uniformly spread across functional units and instructions: some are affected more (hence less reliable) than others. This paper proposes a NBTI-aware compiler-directed very long instruction word (VLIW) assignment scheme that uniformly distributes the stress of instructions with the aim of minimizing aging of GPGPU architecture without any performance penalty. The proposed solution is an entirely software technique based on static workload characterization and online execution with NBTI monitoring that equalizes the expected lifetime of each processing element by regenerating aging-aware healthy kernels that respond to the specific health state of GPGPU. We demonstrate our approach on AMD Ever-green architecture where iso-throughput executions of the healthy kernels reduce NBTI-induced voltage threshold shift up to 49% (11%) compared to naïve kernel executions, with (without) architectural support for power-gating. The kernel adaption flow takes average of 13 millisecond on a typical host machine thus making it suitable for practical implementation. [ABSTRACT FROM AUTHOR]
Published: 2013

30. Hierarchically focused guardbanding.

Author: Rahimi, Abbas, Benini, Luca, and Gupta, Rajesh K.
Published: 2013

31. Enabling fine-grained OpenMP tasking on tightly-coupled shared memory clusters.

Author: Burgio, Paolo, Tagliavini, Giuseppe, Marongiu, Andrea, and Benini, Luca
Published: 2013

32. Design of an ultra-low power device for aircraft structural health monitoring.

Author: Perelli, Alessandro, Caione, Carlo, De Marchi, Luca, Brunelli, Davide, Marzani, Alessandro, and Benini, Luca
Published: 2013

33. A survey of multi-source energy harvesting systems.

Author: Weddell, Alex S., Magno, Michele, Merrett, Geoff V., Brunelli, Davide, Al-Hashimi, Bashir M., and Benini, Luca
Published: 2013

34. Variation-tolerant OpenMP tasking on tightly-coupled processor clusters.

Author: Rahimi, Abbas, Marongiu, Andrea, Burgio, Paolo, Gupta, Rajesh K., and Benini, Luca
Published: 2013

35. SCC thermal model identification via advanced bias-compensated least-squares.

Author: Diversi, Roberto, Bartolini, Andrea, Tilli, Andrea, Beneventi, Francesco, and Benini, Luca
Published: 2013

36. Procedure hopping.

Author: Rahimi, Abbas, Benini, Luca, and Gupta, Rajesh
Published: 2012
Full Text: View/download PDF

37. Platform 2012, a Many-Core Computing Accelerator for Embedded SoCs: Performance Evaluation of Visual Analytics Applications.

Author: Melpignano, Diego, Benini, Luca, Flamand, Eric, Jego, Bruno, Lepley, Thierry, Haugou, Germain, Clermidy, Fabien, and Dutoit, Denis
Subjects: NETWORKS on a chip, SYSTEMS on a chip, FEATURE extraction, VISUAL analytics, COMPUTER vision, COMPLEMENTARY metal oxide semiconductor design & construction, SIMULTANEOUS multithreading processors
Abstract: P2012 is an area- and power-efficient many-core computing accelerator based on multiple globally asynchronous, locally synchronous processor clusters. Each cluster features up to 16 processors with independent instruction streams sharing a multibanked one-cycle access L1 data memory, a multi-channel DMA engine and specialized hardware for synchronization and aggressive power management. P2012 is 3D stacking ready and can be customized to achieve extreme area and energy efficiency by adding domain-specific HW IPs to the cluster. The first P2012 SoC prototype in 28nm CMOS will sample in Q3, featuring four 16-processor clusters, a 1MB L2 memory and delivering 80GOPS (with 32 bit single precision floating point support) in 18mm2 with 2W power consumption (worst-case). P2012 can run standard OpenCLTM and proprietary Native Programming Model SW components to achieve the highest level of control on applicationto- resource mapping. A dedicated version of the OpenCV vision library is provided in the P2012 SW Development Kit to enable visual analytics acceleration. This paper will discuss preliminary performance measurements of common feature extraction and tracking algorithms, parallelized on P2012, versus sequential execution on ARM CPUs. [ABSTRACT FROM AUTHOR]
Published: 2012

38. An energy efficient DRAM subsystem for 3D integrated SoCs.

Author: Weis, Christian, Loi, Igor, Benini, Luca, and Wehn, Norbert
Published: 2012

39. Analysis of instruction-level vulnerability to dynamic voltage and temperature variations.

Author: Rahimi, Abbas, Benini, Luca, and Gupta, Rajesh K.
Published: 2012

40. P2012.

Author: Benini, Luca, Flamand, Eric, Fuin, Didier, and Melpignano, Diego
Published: 2012

41. A resilient architecture for low latency communication in shared-L1 processor clusters.

Author: Kakoee, Mohammad Reza, Loi, Igor, and Benini, Luca
Published: 2012

42. Design of a collective communication infrastructure for barrier synchronization in cluster-based nanoscale MPSoCs.

Author: Abellán, José L., Fernández, Juan, Acacio, Manuel E., Bertozzi, Davide, Bortolotti, Daniele, Marongiu, Andrea, and Benini, Luca
Published: 2012

43. Quantifying the impact of frequency scaling on the energy efficiency of the single-chip cloud computer.

Author: Bartolini, Andrea, Sadri, MohammadSadegh, Furst, John-Nicholas, Coskun, Ayse Kivilcim, and Benini, Luca
Published: 2012

44. Fast and lightweight support for nested parallelism on cluster-based embedded many-cores.

Author: Marongiu, Andrea, Burgio, Paolo, and Benini, Luca
Published: 2012

45. Smart power unit with ultra low power radio trigger capabilities for wireless sensor networks.

Author: Magno, Michele, Marinkovic, Stevan, Brunelli, Davide, Popovici, Emanuel, O'Flynn, Brendan, and Benini, Luca
Published: 2012

46. SoC-TM.

Author: Ferri, Cesare, Marongiu, Andrea, Lipton, Benjamin, Bahar, R. Iris, Moreshet, Tali, Benini, Luca, and Herlihy, Maurice
Published: 2011
Full Text: View/download PDF

47. Vertical stealing.

Author: Marongiu, Andrea, Burgio, Paolo, and Benini, Luca
Published: 2010
Full Text: View/download PDF

48. Battery-aware power management techniques for wearable haptic nodes.

Author: Rofouei, Mahsan, Farella, Elisabetta, Brunelli, Davide, Sarrafzadeh, Majid, and Benini, Luca
Published: 2010
Full Text: View/download PDF

49. Automatic synthesis of near-threshold circuits with fine-grained performance tunability.

Author: Kakoee, Mohammad Reza, Sathanur, Ashoka, Pullini, Antonio, Huisken, Jos, and Benini, Luca
Published: 2010
Full Text: View/download PDF

50. A virtual platform environment for exploring power, thermal and reliability management control strategies in high-performance multicores.

Author: Bartolini, Andrea, Cacciari, Matteo, Tilli, Andrea, Benini, Luca, and Gries, Matthias
Published: 2010
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

247 results on '"Benini, Luca"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources