Author: "Hussam Amrouch" / Publisher: institute of electrical and electronics engineers (ieee) - Searchworks@Jio Institute Digital Library Search Results

1. Machine Learning-Based Microarchitecture- Level Power Modeling of CPUs

Author: Ajay Krishna Ananda Kumar, Sami Al-Salamin, Hussam Amrouch, and Andreas Gerstlauer
Subjects: Computational Theory and Mathematics, Hardware and Architecture, Software, Theoretical Computer Science
Published: 2023
Full Text: View/download PDF

2. HW/SW Codesign for Approximation-Aware Binary Neural Networks

Author: Abhilasha Dave, Fabio Frustaci, Fanny Spagnolo, Mikail Yayla, Jian-Jia Chen, and Hussam Amrouch
Subjects: Electrical and Electronic Engineering
Published: 2023
Full Text: View/download PDF

3. Compact CMOS-Compatible Majority Gate Using Body Biasing in FDSOI Technology

Author: Brunno Alves de Abreu, Albi Mema, Simon Thomann, Guilherme Paim, Paulo Flores, Sergio Bampi, and Hussam Amrouch
Subjects: Electrical and Electronic Engineering
Published: 2023
Full Text: View/download PDF

4. Targeting DNN Inference Via Efficient Utilization of Heterogeneous Precision DNN Accelerators

Author: Ourania Spantidi, Georgios Zervakis, Sami Alsalamin, Isai Roman-Ballesteros, Jörg Henkel, Hussam Amrouch, and Iraklis Anagnostopoulos
Subjects: Human-Computer Interaction, Computer Science (miscellaneous), Computer Science Applications, Information Systems
Published: 2023
Full Text: View/download PDF

5. FDSOI-Based Analog Computing for Ultra-Efficient Hamming Distance Similarity Calculation

Author: Albi Mema, Simon Thomann, Paul R. Genssler, and Hussam Amrouch
Subjects: Hardware and Architecture, Electrical and Electronic Engineering
Published: 2023
Full Text: View/download PDF

6. Efficient Learning Strategies for Machine Learning-Based Characterization of Aging-Aware Cell Libraries

Author: Florian Klemme and Hussam Amrouch
Subjects: Hardware and Architecture, Electrical and Electronic Engineering
Published: 2022
Full Text: View/download PDF

7. Leveraging Ferroelectric Stochasticity and In-Memory Computing for DNN IP Obfuscation

Author: Likhitha Mankali, Nikhil Rangarajan, Swetaki Chatterjee, Shubham Kumar, Yogesh Singh Chauhan, Ozgur Sinanoglu, and Hussam Amrouch
Subjects: Hardware and Architecture, Electrical and Electronic Engineering, Electronic, Optical and Magnetic Materials
Published: 2022
Full Text: View/download PDF

8. Characterizing Approximate Adders and Multipliers for Mitigating Aging and Temperature Degradations

Author: Francisco Javier Hernandez Santiago, Honglan Jiang, Hussam Amrouch, Andreas Gerstlauer, Leibo Liu, and Jie Han
Subjects: Hardware and Architecture, Electrical and Electronic Engineering
Published: 2022
Full Text: View/download PDF

9. A Novel Attack Mode on Advanced Technology Nodes Exploiting Transistor Self-Heating

Author: Nikhil Rangarajan, Johann Knechtel, Nimisha Limaye, Ozgur Sinanoglu, and Hussam Amrouch
Subjects: Electrical and Electronic Engineering, Computer Graphics and Computer-Aided Design, Software
Published: 2022
Full Text: View/download PDF

10. Trojan Detection in Embedded Systems With FinFET Technology

Author: Virinchi Roy Surabhi, Prashanth Krishnamurthy, Hussam Amrouch, Jorg Henkel, Ramesh Karri, and Farshad Khorrami
Subjects: Computational Theory and Mathematics, Hardware and Architecture, Software, Theoretical Computer Science
Published: 2022
Full Text: View/download PDF

11. GNN4REL: Graph Neural Networks for Predicting Circuit Reliability Degradation

Author: Lilas Alrahis, Johann Knechtel, Florian Klemme, Hussam Amrouch, and Ozgur Sinanoglu
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Computer Science - Cryptography and Security, Electrical and Electronic Engineering, Cryptography and Security (cs.CR), Computer Graphics and Computer-Aided Design, Software, Machine Learning (cs.LG)
Abstract: Process variations and device aging impose profound challenges for circuit designers. Without a precise understanding of the impact of variations on the delay of circuit paths, guardbands, which keep timing violations at bay, cannot be correctly estimated. This problem is exacerbated for advanced technology nodes, where transistor dimensions reach atomic levels and established margins are severely constrained. Hence, traditional worst-case analysis becomes impractical, resulting in intolerable performance overheads. Contrarily, process-variation/aging-aware static timing analysis (STA) equips designers with accurate statistical delay distributions. Timing guardbands that are small, yet sufficient, can then be effectively estimated. However, such analysis is costly as it requires intensive Monte-Carlo simulations. Further, it necessitates access to confidential physics-based aging models to generate the standard-cell libraries required for STA. In this work, we employ graph neural networks (GNNs) to accurately estimate the impact of process variations and device aging on the delay of any path within a circuit. Our proposed GNN4REL framework empowers designers to perform rapid and accurate reliability estimations without accessing transistor models, standard-cell libraries, or even STA; these components are all incorporated into the GNN model via training by the foundry. Specifically, GNN4REL is trained on a FinFET technology model that is calibrated against industrial 14nm measurement data. Through our extensive experiments on EPFL and ITC-99 benchmarks, as well as RISC-V processors, we successfully estimate delay degradations of all paths -- notably within seconds -- with a mean absolute error down to 0.01 percentage points., This article will be presented in the International Conference on Compilers, Architectures, and Synthesis for Embedded Systems (CASES) 2022 and will appear as part of the ESWEEK-TCAD special issue more...
Published: 2022
Full Text: View/download PDF

12. Thermal-Aware Design for Approximate DNN Accelerators

Author: Georgios Zervakis, Iraklis Anagnostopoulos, Sami Salamin, Ourania Spantidi, Isai Roman-Ballesteros, Jorg Henkel, and Hussam Amrouch
Subjects: Computational Theory and Mathematics, Hardware and Architecture, Software, Theoretical Computer Science
Published: 2022
Full Text: View/download PDF

13. Comprehensive Variability Analysis in Dual-Port FeFET for Reliable Multi-Level-Cell Storage

Author: Swetaki Chatterjee, Simon Thomann, Kai Ni, Yogesh Singh Chauhan, and Hussam Amrouch
Subjects: Electrical and Electronic Engineering, Electronic, Optical and Magnetic Materials
Published: 2022
Full Text: View/download PDF

14. Electrothermal Simulation and Optimal Design of Thermoelectric Cooler Using Analytical Approach

Author: Hussam Amrouch, Sheldon X.-D. Tan, Sheriff Sadiqbatcha, and Liang Chen
Subjects: Thermoelectric cooling, Materials science, Multiphysics, Heat generation, TEC, Thermoelectric effect, Mechanics, Electrical and Electronic Engineering, Thermal conduction, Joule heating, Computer Graphics and Computer-Aided Design, Software, Finite element method
Abstract: In this paper, electrothermal modeling and simulation of thermoelectric cooling (TEC) in the package design of VLSI systems are performed by solving coupled heat conduction and current continuity equations. We propose a new analytical solution to the coupled partial differential equations which describe temperature and voltage with the reduction from 3D to 1D. In addition to this, we derive new analytic expressions for two key performance metrics for TEC devices: the maximum temperature difference and the maximum heat-flux pumping capability, which can be guided for the optimal design of thermoelectric cooler to achieve the maximum cooling performance. Further, for the first time, we observe that when the dimensionless figure of merit ZT0 value is larger than 1, there is no maximum heat-flux value, which means the heat dissipation due to Peltier and Fourier transfer effects is larger than the heat generation caused by Joule heating effect, which can lead to more efficient TEC cooling design. The accuracy of the proposed 1D formulas is verified by 3D finite element method using COMSOL software. The compact model delivers many orders of magnitude speedup and memory saving compared to COMSOL with marginal accuracy loss. Compared with the conventional simplified 1D energy equilibrium model, the proposed analytical coupled multiphysics model is more robust and accurate. more...
Published: 2022
Full Text: View/download PDF

15. Evaluating the Robustness of Complementary Channel Ferroelectric FETs Against Total Ionizing Dose Towards Radiation-Tolerant Embedded Nonvolatile Memory

Author: Kai Ni, Ronald D. Schrimpf, Daniel M. Fleetwood, enxia zhang, Hussam Amrouch, Vijaykrishnan Narayanan, santosh kurinec, xiao gong, sven beyer, steven soss, stefan duenkel, Halid Mulaosmanovic, Zubair Faris, munazza sayed, Xuyi Luo, Zixiang Guo, and Zhouhang Jiang more...
Abstract: In this work, a thorough assessment of the robustness of complementary channel HfO2 ferroelectric FET (FeFET) against total ionizing dose (TID) radiation is conducted, with the goal of determining its suitability for use as high-performance and energy-efficient embedded nonvolatile memory (eNVM) for space applications. We demonstrate that: i) ferroelectric HfO2 thin film is robust against X-ray and proton irradiation; ii) FeFET exhibits a polarization state dependent radiation sensitivity where the high-VTH (HVT) state sees noticeable negative VTH shift and low-VTH (LVT) is immune to irradiation, irrespective of the channel type; iii) the state dependence is ascribed to the depolarization field in the HVT, which points toward the channel and facilitates the transport and trapping of radiation-generated holes close to the channel. In the future, radiation hardening techniques need to be considered. more...
Published: 2023
Full Text: View/download PDF

16. Reliable Binarized Neural Networks on Unreliable Beyond Von-Neumann Architecture

Author: Mikail Yayla, Simon Thomann, Sebastian Buschjager, Katharina Morik, Jian-Jia Chen, and Hussam Amrouch
Subjects: Hardware and Architecture, Electrical and Electronic Engineering
Published: 2022
Full Text: View/download PDF

17. Scalable Machine Learning to Estimate the Impact of Aging on Circuits Under Workload Dependency

Author: Florian Klemme and Hussam Amrouch
Subjects: Electrical and Electronic Engineering
Published: 2022
Full Text: View/download PDF

18. Full-Chip Power Density and Thermal Map Characterization for Commercial Microprocessors Under Heat Sink Cooling

Author: Jinwei Zhang, Hussam Amrouch, Sheldon X.-D. Tan, Michael OrDea, and Sheriff Sadiqbatcha
Subjects: Materials science, Nuclear engineering, Thermal, Electrical and Electronic Engineering, Heat sink, Chip, Computer Graphics and Computer-Aided Design, Software, Characterization (materials science), Power density
Published: 2022
Full Text: View/download PDF

19. On the Reliability of FeFET On-Chip Memory

Author: Jorg Henkel, Victor M. van Santen, Hussam Amrouch, and Paul R. Genssler
Subjects: business.industry, CPU cache, Computer science, Transistor, Theoretical Computer Science, law.invention, Reliability (semiconductor), Computational Theory and Mathematics, Hardware and Architecture, Memory cell, law, Power consumption, Embedded system, System level, business, Cmos process, Software, Voltage
Abstract: FeFET is a promising technology for non-volatile on-chip memories. It is rapidly attracting an ever-increasing attention from industry. The advantage of FeFETs is full compatibility with the existing CMOS process beside their low power consumption. To enable ultra-dense memories, 1-FeFET AND arrays were proposed in which a memory cell is formed from a single FeFET. All access transistors, which are traditionally needed to operate memory cells, are removed. This imposes a new reliability challenge due to indirect write disturbances. In this work, we are the first to investigate the reliability of FeFET memories from device to system level. We develop a unified model capturing the impact of both indirect disturbances and direct writes on the reliability of FeFET cells. We then investigate different array sizes, write voltages, write methods under the effect of a wide range of workloads using CPU cache as an example of on-chip memory. We demonstrate that indirect write disturbances are the dominate effect degrading the reliability of FeFET memories. For most cells, it contributes over 90% to the overall induced degradation. This provides guidelines for researchers at both device and circuit levels to optimize the FeFET reliability further, while considering the hidden impact of workloads. more...
Published: 2022
Full Text: View/download PDF

20. Impact of NCFET Technology on Eliminating the Cooling Cost and Boosting the Efficiency of Google TPU

Author: Sami Salamin, Florian Klemme, Jorg Henkel, Hussam Amrouch, Yogesh Singh Chauhan, Hammam Kattan, and Georgios Zervakis
Subjects: Boosting (machine learning), Thermoelectric cooling, Artificial neural network, Computer science, Computation, Operating frequency, Automotive engineering, Theoretical Computer Science, Power (physics), Computational Theory and Mathematics, Application-specific integrated circuit, Hardware and Architecture, Software, Negative impedance converter
Abstract: Recent breakthroughs in Neural Networks (NNs) led to significant accuracy improvements. This accuracy improvement comes at the cost of immense increase in computation demands. NNs became one of the most common and computationally intensive workloads in today's datacenters. To address these computational demands, Google announced in 2016 the Tensor Processing Unit (TPU), an advanced custom ASIC accelerator for NN inference. Two new TPU versions (v2 and v3) followed that support also training. Google TPUv3 packs immense processing power in a tiny and condensed area, leading to very high on-chip power densities and thus excessive temperature. In this work, superlattice thermoelectric cooling, which is one of the emerging on-chip cooling, is considered as an advanced cooling example for Google TPU and we investigate the impact of Negative Capacitance FET (NCFET) on the cooling and efficiency of TPU. Our results demonstrate that NCFET can significantly minimize the required cooling-cost. We explore all NCFET configurations including the thickness of the ferroelectric layer of NCFET, the operating voltage, cooling, and the operating frequency, in addition to all possible FinFET's configurations. Moreover, our experimental evaluation shows that by eliminating the cooling cost, NCFET delivers 2.8x higher efficiency compared to the conventional FinFET baseline. more...
Published: 2022
Full Text: View/download PDF

21. FN-CACTI: Advanced CACTI for FinFET and NC-FinFET Technologies

Author: Divya Praneetha Ravipati, Rajesh Kedia, Victor M. Van Santen, Jorg Henkel, Preeti Ranjan Panda, and Hussam Amrouch
Subjects: Hardware and Architecture, Electrical and Electronic Engineering, Software
Published: 2022
Full Text: View/download PDF

22. A Framework for Crossing Temperature-Induced Timing Errors Underlying Hardware Accelerators to the Algorithm and Application Layers

Author: Brunno Abreu, Jorg Henkel, Sergio Bampi, Leandro Mateus Giacominni Rocha, Eduardo Costa, Guilherme Paim, and Hussam Amrouch
Subjects: Very-large-scale integration, business.industry, Clock rate, Transistor, Application layer, Theoretical Computer Science, law.invention, Sum of absolute differences, Computational Theory and Mathematics, Hardware and Architecture, law, Hardware acceleration, business, Algorithm, Encoder, Software, Computer hardware, Degradation (telecommunications)
Abstract: Temperature rising is an unavoidable effect on VLSI and has always been a critical issue in any system-on-chip – especially when targeting compute-intensive applications. This effect increases the delay in hardware accelerators, resulting in timing errors due to unsustainable clock frequency, whose impact must be carefully evaluated on design time to measure the performance degradation of the hardware accelerator. Many algorithms, such as in multimedia and machine learning applications, are capable of tolerating hardware errors. Yet, these algorithms have a dynamic behavior (i.e., closed-loop) where a timing error can be propagated, affecting subsequent steps. Measuring the degradation-induced errors in these applications is very challenging given that an accurate gate-level simulation to investigate degradation-induced timing errors needs to be coupled dynamically with a system-level simulator to unveil how induced errors in the underlying hardware ultimately impact the algorithm execution in the hardware accelerator. This is the first work to achieve this goal. State-of-the-art works have studied accelerators under timing-errors when removing (or narrowing) guardbands. However, their approach was suitable only for open-loop hardware accelerators which are entirely agnostic of complex interactions of the algorithms. This paper investigates temperature- and aging-induced timing-errors in the joint accelerator-algorithm interactions and their runtime impacts. Our framework investigates aging effects across the different layers starting from transistor physics all the way up to the algorithm layer. The hardware accelerator employed as a case study in this work is the sum of absolute differences (SAD), which is the most compute-intensive accelerator on commercial video encoder for mobile applications. Our results demonstrate the runtime behavior impacts of three advanced block-matching algorithms of the video encoder in a joint operation by a SAD accelerator under timing-errors induced by temperature and aging effects considering a 14nm FinFET technology. more...
Published: 2022
Full Text: View/download PDF

23. Comprehensive Modeling of Switching Behaviour in BEOL FeFET for Monolithic 3D Integration

Author: Hussam Amrouch, Yogesh Singh Chauhan, kai ni, om prakash, simon thomann, and shubham kumar
Abstract: We have developed a comprehensive modeling framework to explain the switching characteristics of BEOL-compatible FeFET with an amorphous IGZO channel. Our TCAD-based modeling framework, calibrated against measurement data, jointly incorporates a) the distributed channel, b) a physics- based nucleation-limited switching dynamics model for multi- domain ferroelectric polarization (PFE) and c) the domain-domain interaction. To our knowledge, this is the first demonstration of a physics-based comprehensive model of BEOL-compatible FeFET. Our model reproduces and explains the experimentally- observed abrupt current jumps in the reverse and forward DC sweeps. Further, our model is capable of processing arbitrary input waveforms such as quasi-DC and different kinds of pulse trains used in neuromorphic applications. This comprehensive modeling framework would enable researchers to explore the BEOL FeFET applications and guide device optimization and development. more...
Published: 2023
Full Text: View/download PDF

24. Bridging the Gap Between Voltage Over-Scaling and Joint Hardware Accelerator-Algorithm Closed-Loop

Author: Eduardo Costa, Guilherme Paim, Hussam Amrouch, Sergio Bampi, and Jorg Henkel
Subjects: Reduction (complexity), Sum of absolute differences, Computer science, Quality of service, Encoding (memory), Clock rate, Media Technology, Hardware acceleration, Electrical and Electronic Engineering, Encoder, Algorithm, Data compression
Abstract: Voltage over-scaling (VOS) optimizes energy while causing timing errors due to an unsustainable clock frequency. Many algorithms, such as in multimedia and machine learning applications, are capable of tolerating such errors. VOS has never been investigated in hardware accelerators running closed-loop algorithms. As the errors impact most decisions and actions in the subsequent steps, closed-loops dynamically change the execution flow. Timing errors should be evaluated by an accurate gate-level simulation, but a large gap still remains: how these timing errors propagate from the underlying hardware all the way up to the entire algorithm run, where they just may degrade the performance and quality of service of the application at stake? This paper tackles this issue showing a framework for VOS investigation, embracing any kind of application. Our framework simulates the VOS-induced timing errors at gate-level, dynamically linking the hardware result with the algorithm and vice versa during the evolution of the runtime of the application. The state-of-the-art VOS literature for video encoding application fails to assess the ultimate impacts of VOS-induced timing errors, as current works open the encoding loops. Unlike those, our work investigates the ultimate impact of a hardware accelerator dynamically carrying through to the video encoder all VOS-induced timing errors and preserving the full compliance to the standard. We employ a parallel sum of absolute differences (SAD) hardware accelerator as a case study. We assess the performance of the overall encoder under varying timing guardbands. Next, it is demonstrated that, under VOS, the ultimate impact in compression efficiency is related to the video’s motion intensity. Additionally, the advantages of timing guardband controlled reduction are clearly quantified in our results by virtue of the framework. Reducing at maximum 9.5% the clock frequency, energy savings (up to 16.5% in energy/operation) are achieved in SAD for video compression. more...
Published: 2022
Full Text: View/download PDF

25. Towards a New Thermal Monitoring Based Framework for Embedded CPS Device Security

Author: Hussam Amrouch, Naman Patel, Michael Shamouilian, Farshad Khorrami, Ramesh Karri, Prashanth Krishnamurthy, and Jorg Henkel
Subjects: 021110 strategic, defence & security studies, Computer science, business.industry, Computation, Testbed, Real-time computing, 0211 other engineering and technologies, 02 engineering and technology, Software, Control system, Thermal, Code (cryptography), Thermal monitoring, Side channel attack, Electrical and Electronic Engineering, business
Abstract: This paper introduces a methodology to use the thermal side channel as a proxy for the behavior of embedded processors to detect changes in this behavior in a cyber-physical system. Such changes may be due to software attacks, hardware attacks, and altered processors. Since control system processes are periodic computations, the thermal side channel signals exhibit a temporal pattern. This enables detection of altered code and changed device characteristics. We present a machine learning approach to estimate the activity of the embedded device from the time sequence of thermal images and show the extent that deviations from expected behavior can be detected. The approach is validated on a testbed of a multi-core processor running a periodic computational code. The infrared imager directly collects thermal imagery from the processor, which is cooled from the backside. While an external infrared imager is used in this study, it is desirable to deploy a finite number of on-chip temperature sensors. This paper shows that integrating on-chip temperature sensors allows robust real-time monitoring of the processor behavior. Finally, we also offer a machine learning approach to find the optimal locations of the on-chip sensors to aid detection. more...
Published: 2022
Full Text: View/download PDF

26. PROTON: Post-Synthesis Ferroelectric Thickness Optimization for NCFET Circuits

Author: Sami Salamin, Hussam Amrouch, Georgios Zervakis, Jorg Henkel, and Yogesh Singh Chauhan
Subjects: Materials science, business.industry, Transistor, Hardware_PERFORMANCEANDRELIABILITY, Capacitance, law.invention, Capacitor, CMOS, law, Logic gate, Hardware_INTEGRATEDCIRCUITS, Netlist, Optoelectronics, Electrical and Electronic Engineering, business, Hardware_LOGICDESIGN, Negative impedance converter, Electronic circuit
Abstract: For the first time, we demonstrate an optimization technique to synthesize circuits in the Negative Capacitance FET (NCFET) technology. NCFET is a rapidly emerging technology to replace the currently employed CMOS technology due to its profound ability to overcome the fundamental limit in scaling along with its full compatibility with the existing fabrication process. This is achieved by replacing the traditional transistor gate dielectric with a ferroelectric layer that manifests itself as a Negative Capacitance (NC), which magnifies the electric field. As a result, NCFET-based circuits can operate at a higher clock frequency without the need to increase the operating voltage. NC breaks one of the fundamental laws in physics in which the total capacitance of two capacitors connected in series becomes larger–instead of smaller in ordinary capacitors– than each of them. This could lead to sub-optimal netlists, suffering from significant increase in dynamic power and IR-drops. To suppress that, we employ the relation between delay decrease and capacitance increase of gates w.r.t ferroelectric thickness. Our technique takes an optimized netlist, obtained from commercial EDA tools, and then selectively determines the optimal ferroelectric thickness for each gate in the netlist, so that the maximum performance provided by NCFET is still achieved while the dynamic power is considerably decreased (45% on average), i.e., no trade-offs . Particularly, our technique enables the full exploitation of the performance benefits originating by NCFET, at a significantly lower (power) cost. Compared to state of the art, our technique decreases the energy-delay-product of circuits by 25% on average and reduces the deleterious effects of IR-drop by 56%. Hence, efficiency and reliability of circuits are improved without any loss in the obtained performance from NCFET. more...
Published: 2021
Full Text: View/download PDF

27. Power-Efficient Heterogeneous Many-Core Design With NCFET Technology

Author: Anuj Pathania, Jorg Henkel, Sami Salamin, Hussam Amrouch, Arka Maity, Martin Rapp, and Tulika Mitra
Subjects: Amdahl's law, Transistor, Capacitance, Theoretical Computer Science, Microarchitecture, law.invention, symbols.namesake, Beyond CMOS, Computational Theory and Mathematics, CMOS, Hardware and Architecture, law, Logic gate, Electronic engineering, symbols, Electrical efficiency, Software
Abstract: Multi-/many-core, homogeneous or heterogeneous architectures, using the existing CMOS technology are inevitably approaching the limit of attainable power efficiency due to the fundamental limits in scaling. Negative Capacitance Field-Effect Transistor (NCFET) is rapidly emerging as an alternative technology that promises a multi-fold increase in the power efficiency of transistors, yet is compatible with the existing CMOS fabrication process. NCFET incorporates a ferroelectric (FE) layer within the transistor's gate stack, which exhibits a negative capacitance effect amplifying the internal voltage. NCFET has been in detail studied in both physics and devices/circuits communities where its superiority has been demonstrated in semiconductor measurements. However, the full promise of NCFET remains unmodeled and unquantified unless the research is further continued to the microarchitecture and system levels. This article, for the first time, explores system- and application-level benefits of NCFET-based multi-/many-core designs in terms of performance and power-efficiency compared to state-of-the-art FinFET-based designs. This exploration is done first through analytical modeling in which we extend Amdahl's law for NCFET multi-/many-cores, and then through quantitative modeling. The latter is achieved through RTL- and system-level simulations of NCFET-based multi-cores. The analytical modeling shows that a novel type of technology-based heterogeneity in which cores with the same microarchitecture but different FE thickness are combined is highly beneficial. Our exploration shows that this novel heterogeneity increases the power-efficiency by up to 3.5× over homogeneous systems and even achieves 8.3% better performance and 20% higher power-efficiency than conventional heterogeneity in the microarchitecture without having to cope with the complexity of managing different microarchitectures. more...
Published: 2021
Full Text: View/download PDF

28. Cross-Layer Reliability Modeling of Dual-Port FeFET: Device-Algorithm Interaction

Author: Shubham Kumar, Swetaki Chatterjee, Simon Thomann, Yogesh Singh Chauhan, and Hussam Amrouch
Subjects: Hardware and Architecture, Electrical and Electronic Engineering
Abstract: Today's data-centric applications are incompatible with the predominant compute-centric computer architectures. The small on-chip memories of compute-centric computer architecture demand many energy-costly data transfers exposing the von-Neumann bottleneck. The Ferroelectric Field-Effect Transistor (FeFET) is an emerging Non-Volatile Memory technology enabling novel data-centric architectures that go far beyond von-Nuemann principles. FeFETs are very promising for a wide range of applications starting from on-chip memories to in-memory computing and even neuromorphic computing. Nevertheless, FeFET devices exhibit significant variations that can severely restrict their applicability. Temperature further exacerbates variation effects because it degrades ferroelectric parameters. Hence, it is indispensable to investigate and model design-time variations, run-time variations, and stochastic variations due to spatial fluctuation of ferroelectric domains under different temperatures. Dual-port FeFET has been recently proposed and demonstrated as novel structure that offers for the first time disturb-free read operation along with larger memory window (MW) compared to conventional FeFETs. However, all of the before-mentioned types of variations are amplified in such a new structure. In this work, the impact of temperature variation is analyzed for dual-port FeFETs for the first time in a cross-layer manner starting from the device level to the circuit/system levels and compared to conventional FeFET. We focus in our analysis on Hyperdimensional Computing (HDC), which is an emerging type of machine learning algorithm, that is being executed on top of FeFET-based in-memory circuits that perform efficient Hamming distance (i.e., similarity) computations. Through our cross-layer framework, we demonstrate the serious impact of variation on FeFET reliability despite the significant increase in the MW that dual-port FeFET offers. Even HDC is affected, despite its remarkable robustness against errors. All in all, our work reveals that a larger MW at the device level does not necessarily translate to benefits at the application level. Hence, investigating and modeling variability effects in a cross-layer manner is indispensable. more...
Published: 2022
Full Text: View/download PDF

29. On-Demand Mobile CPU Cooling With Thin-Film Thermoelectric Array

Author: Jorg Henkel, Sung Woo Chung, Hussam Amrouch, and Hammam Kattan
Subjects: Mobile processor, Multi-core processor, business.industry, Computer science, Electrical engineering, Chip, Temperature measurement, Hardware and Architecture, Thermoelectric effect, System on a chip, Electrical and Electronic Engineering, business, Energy harvesting, Software, Energy (signal processing)
Abstract: On-demand cooling is inevitable to maximize the processor’s performance, while fulfilling thermal constraints—this holds more in advanced technologies, where localized hotspots change during runtime. In this work, we propose to adopt an array of thin-film thermoelectric (TE) devices, which is integrated within the chip packaging, for both cooling and energy-harvesting purposes. Each TE device within the array can be during the runtime enabled either for energy harvesting or on-demand cooling. Our approach is implemented and evaluated using a mature finite elements analysis tool in which a commercial multicore mobile chip is modeled after careful calibrations together with state-of-the-art TE devices. Results demonstrate that our approach reduces the peak temperature by up to 24 $^\circ C$∘C and the average temperature by 10 $^\circ C$∘C across various benchmarks with the cost of 67.5 mW. Additionally, the harvested energy from the array of TE devices compensates for 89% of the required cooling energy. more...
Published: 2021
Full Text: View/download PDF

30. Machine Learning for On-the-Fly Reliability-Aware Cell Library Characterization

Author: Hussam Amrouch and Florian Klemme
Subjects: Standard cell, Computer science, business.industry, Reliability (computer networking), Static timing analysis, Inference, Machine learning, computer.software_genre, Orders of magnitude (bit rate), Dynamic demand, Artificial intelligence, State (computer science), Electrical and Electronic Engineering, business, computer, Degradation (telecommunications)
Abstract: Aging-induced degradation imposes a major challenge to the designer when estimating timing guardbands. This problem increases as traditional worst-case corners bring over-pessimism to designers, exacerbating competitive and close-to-the-edge designs. In this work, we present an accurate machine learning approach for aging-aware cell library characterization, enabling the designer to evaluate their circuit under the impact of precisely selected degradation. Unlike state of the art, we bring cell library characterization to the designer, empowering their capability in exploring the impact of aging while protecting confidential information from the foundry at the same time. Furthermore, the fast inference of cell libraries makes it feasible, for the first time, to examine aging-induced variability analysis in a Monte-Carlo fashion. Finally, we show that the designer is able to select a less pessimistic timing guardband by choosing adequate delta threshold voltage ( $\Delta {V_{th}} $ ) for their design and their needs. Our machine learning approach reaches an $R^{2}$ score of $>99\%$ for almost all data stored in the cell library. Only timing constraints show slightly less accuracy with an $R^{2}$ score around 95%. When using ML-characterized libraries in static timing analysis, we achieve errors smaller than $\pm 0.5\%$ and $\pm 0.1\%$ for path delay and dynamic power, respectively. Errors in leakage power are negligible and even smaller by orders of magnitude. Our machine learning implementation for standard cell library characterization is publicly available. Download: https://opensource.mlcad.org more...
Published: 2021
Full Text: View/download PDF

31. Post-Silicon Heat-Source Identification and Machine-Learning-Based Thermal Modeling Using Infrared Thermal Imaging

Author: Sheldon X.-D. Tan, Jinwei Zhang, Hengyang Zhao, Jorg Henkel, Sheriff Sadiqbatcha, and Hussam Amrouch
Subjects: Artificial neural network, Infrared, Computer science, 02 engineering and technology, Solid modeling, Computer Graphics and Computer-Aided Design, 020202 computer hardware & architecture, Computational science, Transformation (function), Thermal, 0202 electrical engineering, electronic engineering, information engineering, Overhead (computing), Electrical and Electronic Engineering, Cluster analysis, Software
Abstract: In this article, we present a novel post-silicon approach to locating the dominant heat sources on commercial multicore processors using heatmaps measured via an infrared (IR) thermal imaging setup. To locate the heat sources, 2-D spatial Laplacian transformation is performed on the heatmaps followed by $K$ -means clustering to find the dominant power/heat-source clusters. This is an exclusively post-silicon approach that does not require any knowledge of the underlying design of the commercial chips other than the information that is publicly available. Since the identified clusters are the thermally vulnerable areas on the die, we then propose a machine-learning-based framework to deriving a thermal model capable of estimating their temperatures during online use. Our approach involves collecting transient temperature data of the aforementioned heat sources and synchronized high-level performance metrics from the chip, and training a long-short-term-memory (LSTM) neural network (NN) that uses the performance metrics as inputs to estimate the temperatures of the identified heat sources in real time. Since the model is meant for real-time use, we explore methods of reducing the performance overhead and inference time of the model. This includes a novel power correlation-based approach to identifying the thermally irrelevant performance metrics and eliminating them in order to reduce the input dimensionality of the model, and an analysis on network sizing to determine the ideal NN configuration for the problem at hand. The model is trained and tested exclusively using measured thermal data from commercial multicore processors. The experimental results from two Intel multicore processors (i5-3337U and i7-8650U) show that the proposed approach achieves very high accuracy (root-mean-square error: 0.55 °C–0.93 °C) in estimating the temperatures of all the identified heat sources on the chip. more...
Published: 2021
Full Text: View/download PDF

32. Impact of Self-Heating on Negative-Capacitance FinFET: Device-Circuit Interaction

Author: Chetan Kumar Dabhi, Om Prakash, Yogesh Singh Chauhan, Girish Pahwa, and Hussam Amrouch
Subjects: 010302 applied physics, Materials science, Ring oscillator, 01 natural sciences, Capacitance, Electronic, Optical and Magnetic Materials, Computational physics, Thermal conductivity, Logic gate, 0103 physical sciences, Thermal, Electrical and Electronic Engineering, Communication channel, Negative impedance converter, Voltage
Abstract: In this work, we analyze the impact of self-heating effects (SHEs) on 14-nm negative capacitance (NC)-FinFET performance from device to the circuit level. The 3-D thermal TCAD simulations, after careful calibration with measurements, are performed to analyze the impact of SHE in a broad range of frequency. Furthermore, we use the TCAD calibrated BSIM-CMG model to analyze the impact of SHE in NC-FinFET at the circuit level, after including a physics-based model to capture the NC effect. For the first time, we analyze the impact of a nonuniform distribution of temperature dissipated from the channel region to gate-stack in NC-FinFETs. On account of the thermal insulating properties of the gate-stack, the ferroelectric (FE) layer is found to be cooler than the channel region under the impact of SHE. We demonstrate that neglecting that and, hence, using the channel temperature to evaluate the temperature-dependent parameter $\alpha $ (in the Landau–Khalatanikov model of NC effect) of the FE layer result in a significant overestimation of SHE-induced degradations, such as in the NC voltage gain. Based on our TCAD analysis, we propose a relation between gate-stack temperature and the channel temperature and use this to accurately model the $\alpha $ parameter and, hence, SHE in NC-FinFETs. The SHE is found to dominate for both FinFET and NC-FinFET in the gigahertz range, which eventually degrades the performance at the circuit level, which is further confirmed using ring oscillator (RO) simulations. more...
Published: 2021
Full Text: View/download PDF

33. On the Resiliency of NCFET Circuits Against Voltage Over-Scaling

Author: Eduardo Costa, Hussam Amrouch, Georgios Zervakis, Yogesh Singh Chauhan, Jorg Henkel, Sergio Bampi, Guilherme Paim, and Girish Pahwa
Subjects: Computer science, Transistor, Hardware_PERFORMANCEANDRELIABILITY, Capacitance, law.invention, CMOS, law, Logic gate, Hardware_INTEGRATEDCIRCUITS, Electronic engineering, Field-effect transistor, Electrical and Electronic Engineering, Electronic circuit, Voltage, Negative impedance converter
Abstract: Approximate computing is established as a design alternative to improve the energy requirements of a vast number of applications, leveraging their intrinsic error tolerance. Voltage over-scaling (VOS) is one of the most energy-efficient approximation techniques, but its exploitation is still limited due to the large errors it induces. In this work, we investigate, for the first time, the resiliency of negative capacitance transistor (NCFET) technology to VOS in comparison to conventional CMOS technology. Our work reveals that circuits implemented using the NCFET technology exhibit much less timing errors under VOS due to the inherent voltage amplification provided by the ferroelectric layer. NCFET is one of the very promising emerging technologies that is rapidly evolving for low-power circuit as it enables the transistors to switch faster without the need to increase the voltage. We demonstrate how NCFET technology allows circuit designers to effectively employ VOS to boost the efficiency of their approximate circuits, while still keeping the induced errors marginal. Our analysis shows that the VOS-resilience of NCFET circuits enables maximizing the voltage decrease and thus, NCFET based VOS approximate circuits achieve from $1.83\times$ up to $2.78\times$ higher energy reduction compared to the corresponding FinFET circuits for the same error bounds. more...
Published: 2021
Full Text: View/download PDF

34. Characterizing the Thermal Feasibility of Monolithic 3D Microprocessors

Author: Joonho Kong, Ji Heon Lee, Young-Ho Gong, Hussam Amrouch, Young Seo Lee, Sung Woo Chung, and Jeong Hwan Choi
Subjects: Offset (computer science), General Computer Science, Computer science, on-chip temperature, Clock rate, General Engineering, Hardware_PERFORMANCEANDRELIABILITY, Energy consumption, Monolithic 3D integration, Automotive engineering, TK1-9971, law.invention, Microprocessor, law, cooling solution, Thermal, Hardware_INTEGRATEDCIRCUITS, General Materials Science, System on a chip, Electrical engineering. Electronics. Nuclear engineering, thermal feasibility, Efficient energy use, Power density
Abstract: Monolithic 3D (M3D) integration reduces the wire length, which eventually improves energy efficiency and performance compared to 2D integration. However, 3D integration inevitably causes higher on-chip temperature compared to 2D integration due to the increased power density as well as worse heat dissipation. The high on-chip temperature may offset the benefits of the M3D microprocessors due to the following reasons: 1) high on-chip temperature increases leakage power, which degrades energy efficiency. 2) the actual clock frequency is limited at run-time by frequent dynamic thermal management (DTM) invocations. In this paper, for the first time, we explore the thermal feasibility (whether it is possible to achieve high energy efficiency and performance without exceeding threshold temperature) of the M3D microprocessors depending on cooling solutions. For the thermal feasibility study, we construct an integrated framework to investigate the thermal behaviors and thermal feasibility of different types of microprocessors (M3D, 2D, and through-silicon-via based 3D (TSV-3D)) with different cooling solutions. Our thermal-aware evaluation results show that the best configuration of the M3D microprocessors reduces average energy consumption by 27.6% compared to the 2D microprocessor at an iso-frequency (4.0GHz). In addition, at the highest clock frequencies satisfying both design and thermal constraints, the best configuration of the M3D microprocessors improves average system performance by 25.1% and 26.0% compared to the 2D and TSV-3D microprocessors, respectively. more...
Published: 2021
Full Text: View/download PDF

35. Weight-Oriented Approximation for Energy-Efficient Neural Network Inference Accelerators

Author: Jorg Henkel, Iraklis Anagnostopoulos, Hussam Amrouch, Zois-Gerasimos Tasoulas, and Georgios Zervakis
Subjects: Artificial neural network, Computer science, 020208 electrical & electronic engineering, Inference, 02 engineering and technology, Energy consumption, Object detection, 020202 computer hardware & architecture, Convolution, Computer engineering, Feature (computer vision), 0202 electrical engineering, electronic engineering, information engineering, Electrical and Electronic Engineering, Throughput (business), Efficient energy use
Abstract: Current research in the area of Neural Networks (NN) has resulted in performance advancements for a variety of complex problems. Especially, embedded system applications rely more and more on the utilization of convolutional NNs to provide services such as image/audio classification and object detection. The core arithmetic computation performed during NN inference is the multiply-accumulate (MAC) operation. In order to meet tighter and tighter throughput constraints, NN accelerators integrate thousands of MAC units resulting in a significant increase in power consumption. Approximate computing is established as a design alternative to improve the efficiency of computing systems by trading computational accuracy for high energy savings. In this work, we bring approximate computing principles and NN inference together by designing NN specific approximate multipliers that feature multiple accuracy levels at run-time. We propose a time-efficient automated framework for mapping the NN weights to the accuracy levels of the approximate reconfigurable accelerator. The proposed weight-oriented approximation mapping is able to satisfy tight accuracy loss thresholds, while significantly reducing energy consumption without any need for intensive NN retraining. Our approach is evaluated against several NNs demonstrating that it delivers high energy savings (17.8% on average) with a minimal loss in inference accuracy (0.5%). more...
Published: 2020
Full Text: View/download PDF

36. NPU Thermal Management

Author: Georgios Zervakis, Sami Salamin, Jorg Henkel, Hammam Kattan, Hussam Amrouch, and Iraklis Anagnostopoulos
Subjects: Computer science, Multiphysics, Joule, Static timing analysis, 02 engineering and technology, Thermal management of electronic devices and systems, Computer Graphics and Computer-Aided Design, 020202 computer hardware & architecture, Power (physics), Computational science, Thermoelectric effect, Thermal, 0202 electrical engineering, electronic engineering, information engineering, System on a chip, Electrical and Electronic Engineering, Frequency scaling, Throughput (business), Software
Abstract: Neural processing units (NPUs) are becoming an integral part in all modern computing systems due to their substantial role in accelerating neural networks (NNs). The significant improvements in cost-energy-performance stem from the massive array of multiply accumulate (MAC) units that remarkably boosts the throughput of NN inference. In this work, we are the first to investigate the thermal challenges that NPUs bring, revealing how MAC arrays, which form the heart of any NPU, impose serious thermal bottlenecks to on-chip systems due to their excessive power densities. For the first time, we explore: 1) the effectiveness of precision scaling and frequency scaling (FS) in temperature reductions and 2) how advanced on-chip cooling using superlattice thin-film thermoelectric (TE) open doors for new tradeoffs between temperature, throughput, cooling cost, and inference accuracy in NPU chips. Our work unveils that hybrid thermal management , which composes different means to reduce the NPU temperature, is a key. To achieve that, we propose and implement PFS-TE technique that couples precision and FS together with superlattice TE cooling for effective NPU thermal management. Using commercial signoff tools, we obtain accurate power and timing analysis of MAC arrays after a full-chip design is performed based on 14-nm Intel FinFET technology. Then, multiphysics simulations using finite-element methods are carried out for accurate heat simulations in the presence and absence of on-chip cooling. Afterward, comprehensive design-space exploration is presented to demonstrate the Pareto frontier and the existing tradeoffs between temperature reductions, power overheads due to cooling, throughput, and inference accuracy. Using a wide range of NNs trained for image classification, experimental results demonstrate that our novel NPU thermal management increases the inference efficiency (TOPS/Joule) by $1.33\times $ , $1.87\times $ , and $2\times $ under different temperature constraints; 105 °C, 85 °C, and 70 °C, respectively, while the average accuracy drops merely from 89.0% to 85.5%. more...
Published: 2020
Full Text: View/download PDF

37. Exposing Hardware Trojans in Embedded Platforms via Short-Term Aging

Author: Jorg Henkel, Farshad Khorrami, Hussam Amrouch, Ramesh Karri, Virinchi Roy Surabhi, and Prashanth Krishnamurthy
Subjects: Standard cell, Hardware security module, business.industry, Computer science, 02 engineering and technology, Integrated circuit, Computer Graphics and Computer-Aided Design, 020202 computer hardware & architecture, law.invention, Trojan, law, Hardware_INTEGRATEDCIRCUITS, 0202 electrical engineering, electronic engineering, information engineering, Electrical and Electronic Engineering, business, Critical path method, Software, Computer hardware
Abstract: We demonstrate a novel technique that employs transistor short-term aging effects in integrated circuits (ICs) to detect hardware Trojans in embedded systems. In advanced technology nodes (≤ 45 nm), voltage scaling in combination with short-term aging opens doors for short-term degradations. The induced short-term degradations result in dynamic variation of delays along various paths within the IC. Aging degradation generated under fast voltage switching from high to low results in bit errors at the circuit output. Our experiments use short-term aging-aware standard cell libraries to show the effectiveness of short-term aging to detect hardware Trojans. We extract a rich set of features that capture bit error patterns at the outputs of the IC. We use a one class SVM-based classifier that uses these features to learn the distribution of bit errors at the outputs of a clean IC. We discern the deviation in the pattern of bit errors due to a Trojan in the IC from the baseline distribution. To reiterate, the method uses the model of a clean IC. Furthermore, it is robust against chip-to-chip variations. We illustrate the technique on six Trojans from Trust-Hub spanning two cryptographic chips and an embedded PIC microcontroller. Our approach detects Trojans with an accuracy ≥ 95%. It is easier to detect Trojans in an optimized-netlist circuit as more paths are close to the critical path. Even when the circuit is not optimized (i.e., when very few paths are close to the critical path), short-term aging plus mild overclocking can detect Trojans with high accuracy. more...
Published: 2020
Full Text: View/download PDF

38. Dynamic Power and Energy Management for NCFET-Based Processors

Author: Martin Rapp, Hussam Amrouch, Jorg Henkel, Andreas Gerstlauer, and Sami Salamin
Subjects: Operating point, Energy management, Computer science, 02 engineering and technology, Energy consumption, Computer Graphics and Computer-Aided Design, 020202 computer hardware & architecture, Reliability engineering, Power (physics), Dynamic demand, 0202 electrical engineering, electronic engineering, information engineering, Benchmark (computing), Electrical and Electronic Engineering, Software, Energy (signal processing), Efficient energy use
Abstract: Power and energy consumption are the key optimization goals in all modern processors. Negative capacitance field-effect transistors (NCFETs) are a leading emerging technology that promises outstanding performance in addition to better energy efficiency. The thickness of the added ferroelectric layer as well as frequency and voltage are the key parameters that impact the power and energy of NCFET-based processors in addition to the characteristics of runtime workloads. Unlike existing CMOS technologies, operating NCFET-based processors at a higher frequency than the required minimum can result in power/energy minimization. The optimal operating point, however, strongly depends on dynamic workload characteristics and technology parameters. In this work, we propose and implement the first NCFET-aware power and energy management approach that minimizes the processor’s power and energy through optimal voltage/frequency selection under different runtime scenarios. Such an NCFET-aware approach does not result in any tradeoff between power/energy and performance. Instead, it can achieve higher performance while minimizing energy. A comprehensive, simulation-based evaluation of our runtime management under realistic workloads demonstrates up to 58% energy saving with $2.1\times $ higher performance, and 46% power saving compared to conventional NCFET-unaware management techniques, over the total execution of a benchmark. Compared to state-of-the-art NCFET-aware management techniques, our technique provides up to 49% energy saving and 32% power saving. more...
Published: 2020
Full Text: View/download PDF

39. A Cross-Layer Gate-Level-to-Application Co-Simulation for Design Space Exploration of Approximate Circuits in HEVC Video Encoders

Author: Jorg Henkel, Eduardo Costa, Leandro M. G. Rocha, Guilherme Paim, Hussam Amrouch, and Sergio Bampi
Subjects: Adder, Design space exploration, Computer science, Logic gate, 0202 electrical engineering, electronic engineering, information engineering, Media Technology, 020201 artificial intelligence & image processing, 02 engineering and technology, Electrical and Electronic Engineering, Encoder, Algorithm, Electronic circuit
Abstract: A cross-layer design space exploration (DSE) method based on a proposed co-simulation technique is presented herein. The proposed method is demonstrated evaluating the impacts on both coding efficiency and power dissipation of applying distinct approximate logic operators in a s $\mu {\mathrm{ m}}$ of absolute differences (SAD) kernel that accelerates an H.265/HEVC (high-efficiency video coding) encoder. The proposed method simulates the gate-level circuit dynamically inside the application, with realistic results of the impact of the adder-tree approximate logic implementation on both quality and encoder bit-rate results. A comprehensive DSE is shown herein, with 13 types of 6 classes of approximate adders in the SAD accelerator hardware blocks. Over 3,000 logic variants of approximations at gate-level were developed. Actual video sequences as inputs to the x265 software encoder are co-simulated, to dynamically capture the video motion-estimation (ME) behavior in the presence of logic approximations. While the prior art that only estimates the impact of the approximate logic on power, area, and quality on static designs with statistical assumptions, which are agnostic to the actual algorithm data-dependent behavior in the application, our method explores accurately the trade-off between power dissipation and coding efficiency dynamically over the entire HEVC encoding. Our approach shows that the lower-part-or and error-tolerant adder I approximate adders, as well as truncation-to-zero deliver better compression-power trade-offs, with substantial differences from the static analysis. more...
Published: 2020
Full Text: View/download PDF

40. On the Workload Dependence of Self-Heating in FinFET Circuits

Author: Victor M. van Santen, Hussam Amrouch, Pooja Kumari, and Jorg Henkel
Subjects: 010302 applied physics, Digital electronics, Computer science, business.industry, Transistor, Clock rate, Electrical engineering, 02 engineering and technology, 01 natural sciences, 020202 computer hardware & architecture, law.invention, law, Duty cycle, 0103 physical sciences, 0202 electrical engineering, electronic engineering, information engineering, Netlist, Node (circuits), Electrical and Electronic Engineering, business, Electronic circuit, Communication channel
Abstract: Self-heating effect (SHE) is a major reliability concern in current and upcoming technology nodes due to its ability to increase the channel’s temperature of transistor and leading to degradations in the key electrical characteristics such as carrier mobility. In this brief, we study SHE in a full processor at the 7nm FinFET technology node. This is the first work to analyze the impact that executed workloads on top of processors have on stimulating SHE. As matter of fact, SHE in transistors is driven by the workload-induced switching activities. When it comes to evaluating SHE, state of the art typically assumes that the switching frequency $f_{sw}$ and operating clock frequency $f_{clk}$ of a circuit are the same, concluding that SHE is not a concern in digital circuits that operate in the GHz-range like processors. After analysis a wide range of workloads, our investigation revealed that the majority of transistors in the processor’s netlist exhibit a switching frequency in the kHz-range even though the processor’s clock is in the GHz-range. This is because that the majority of transistors are within the data paths and hence their switching is driven by the workload data and not by the clock itself. In addition, we also demonstrate for the first time the important role that the duty cycle (on-/off-ratio) induced by the running workload has on modeling SHE. All in all, the relatively low switching activities together with skewed duty cycles induce a wide variety in channel temperatures. Thus, highlighting the importance of considering the workload when studying SHE. more...
Published: 2020
Full Text: View/download PDF

41. MLCAD: A Survey of Research in Machine Learning for CAD Keynote Paper

Author: David Z. Pan, Yibo Lin, Jorg Henkel, Marilyn Wolf, Martin Rapp, Bei Yu, and Hussam Amrouch
Subjects: business.industry, Computer science, Heuristic (computer science), Reliability (computer networking), media_common.quotation_subject, DATA processing & computer science, Brute-force search, CAD, 02 engineering and technology, Machine learning, computer.software_genre, Computer Graphics and Computer-Aided Design, 020202 computer hardware & architecture, Set (abstract data type), Open research, 0202 electrical engineering, electronic engineering, information engineering, Quality (business), Artificial intelligence, Configuration space, Electrical and Electronic Engineering, ddc:004, business, computer, Software, media_common
Abstract: Due to the increasing size of s (s), their design and optimization phases (i.e., ) grow increasingly complex. At design time, a large design space needs to be explored to find an implementation that fulfills all specifications and then optimizes metrics like energy, area, delay, reliability, etc. At run time, a large configuration space needs to be searched to find the best set of parameters (e.g., voltage/frequency) to further optimize the system. Both spaces are infeasible for exhaustive search typically leading to heuristic optimization algorithms that find some trade-off between design quality and computational overhead. ML can build powerful models that have successfully been employed in related domains. In this survey, we categorize how () may be used and is used for design-time and run-time optimization and exploration strategies of s. A meta-study of published techniques unveils areas in that are well-explored and underexplored with, as well as trends in the employed algorithms. We present a comprehensive categorization and summary of the state of the art on for. Finally, we summarize remaining challenges and promising open research directions. more...
Published: 2022
Full Text: View/download PDF

42. Golden-Free Robust Age Estimation to Triage Recycled ICs

Author: Virinchi Roy Surabhi, Prashanth Krishnamurthy, Hussam Amrouch, Jorg Henkel, Ramesh Karri, and Farshad Khorrami
Subjects: Electrical and Electronic Engineering, Computer Graphics and Computer-Aided Design, Software
Published: 2023
Full Text: View/download PDF

43. Non-Traditional Design of Dynamic Logics using FDSOI for Ultra-Efficient Computing

Author: Shubham Kumar, Swetaki Chatterjee, Chetan Kumar Dabhi, Yogesh Singh Chauhan, and Hussam Amrouch
Subjects: Hardware and Architecture, Electrical and Electronic Engineering, Electronic, Optical and Magnetic Materials
Published: 2023
Full Text: View/download PDF

44. On the Efficiency of Voltage Overscaling under Temperature and Aging Effects

Author: Hussam Amrouch, Andreas Gerstlauer, Seyed Borna Ehsani, and Jorg Henkel
Subjects: Work (thermodynamics), Voltage reduction, Hardware_PERFORMANCEANDRELIABILITY, 02 engineering and technology, Upper and lower bounds, 020202 computer hardware & architecture, Theoretical Computer Science, Computational Theory and Mathematics, Hardware and Architecture, Control theory, Hardware_INTEGRATEDCIRCUITS, 0202 electrical engineering, electronic engineering, information engineering, Scaling, Software, Energy (signal processing), Electronic circuit, Efficient energy use, Voltage, Mathematics
Abstract: Voltage overscaling has received extensive attention in the last decade as an attractive paradigm for systems in which resulting timing errors and thus a loss in accuracy can be accepted in exchange for an increase in energy efficiency. At the same time, the delay of a circuit is, in turn, and in addition to voltage, also subject to temperature and aging. Existing work has largely studied voltage overscaling in isolation. This ignores interdependencies with temperature and aging, which can lead to wrong or misleading conclusions. In this work, we are the first to model the combined impact of voltage, temperature and aging on the delay of circuits towards investigating the actual existing trade-offs between efficiency and accuracy provided by voltage overscaling. We show that analyzing voltage in isolation overestimates timing errors and thus underestimates the voltage scaling potential. We further develop an approach that leverages interdependencies to optimize energy, delay and accuracy trade-offs. We precisely translate the individual and combined impact of voltage-, temperature-, and aging-induced delay increase into corresponding probability of error ($P_{error}$Perror). This reveals that the same amount of timing increase results in different error probabilities depending on the origin (i.e., voltage, temperature or aging). For the same timing increase, voltage reductions result in the smallest $P_{error}$Perror compared to temperature or aging, while also reducing temperature- and aging-induced delay increases themselves. This allows voltage reduction to be employed as an effective means to minimize delay, reduce energy and thus maximize efficiency under a given upper bound on error probability. We apply our approach to multipliers in GPUs exploring the trade-off between efficiency and accuracy. We demonstrate how only accounting for voltage scaling alone leads to a considerably larger $P_{error}$Perror (74% on average) than in reality. Our investigation also shows that for the same $P_{error}$Perror constraint, optimizing for combined voltage, temperature and aging effects results, on average, in 116% better energy-delay product (EDP) compared to state of the art. more...
Published: 2019
Full Text: View/download PDF

45. Modeling and Evaluating the Gate Length Dependence of BTI

Author: Victor M. van Santen, Jorg Henkel, and Hussam Amrouch
Subjects: 010302 applied physics, Transistor, 02 engineering and technology, 01 natural sciences, 020202 computer hardware & architecture, law.invention, Reliability engineering, Reliability (semiconductor), law, Logic gate, 0103 physical sciences, 0202 electrical engineering, electronic engineering, information engineering, Node (circuits), State (computer science), Static random-access memory, Electrical and Electronic Engineering, Degradation (telecommunications), Electronic circuit
Abstract: Bias temperature instability (BTI) is a major reliability concern in the current and upcoming technologies. Current mitigation techniques mainly target circuit and system levels through reducing the stimuli that govern BTI. Such mitigation techniques typically come with non-negligible overheads, which might not be acceptable—especially in smaller technology nodes where the available design margins are tighter. Semiconductor vendors report different BTI degradations in their technology and hence circuit designers need to consider such technology features when designing for reliability. Recently, TSMC reported that BTI strongly depends on the gate length ( ${L}$ ) of transistors in their 10nm technology node. In this brief, we are the first to investigate the role that gate length dependence of BTI may play as a new degree of freedom in the design for reliability. Based on the reported data from TSMC, we propose to incorporate the gate length dependency into a state-of-the-art physics-based BTI model. Then, we propose a new method of transistor stacking that optimizes circuits (e.g., SRAM cells) with respect to on/off current ratios in which ${L}$ -dependency of BTI is taken into account. Compared to state of the art, our approach results in BTI-hardened SRAM cells with 50% better on/off current ratio along with $3\times$ better reliability at the cost of a 20% area overhead. more...
Published: 2019
Full Text: View/download PDF

46. Modeling and Mitigating Time-Dependent Variability From the Physical Level to the Circuit Level

Author: Victor M. van Santen, Hussam Amrouch, and Jorg Henkel
Subjects: Computer science, Transistor, Hardware_PERFORMANCEANDRELIABILITY, Upper and lower bounds, Noise (electronics), law.invention, Reduction (complexity), Reliability (semiconductor), CMOS, Hardware and Architecture, law, Electronic engineering, Electrical and Electronic Engineering, Electronic circuit, Random dopant fluctuation
Abstract: Variability is one of the major challenges for CMOS in the nano era. Manufacturers test each circuit sample to ensure that samples that do not meet the desired specification are discarded. However, testing is only effective for variability, which is observable right after manufacturing, such as geometric variations, work function, and random dopant fluctuation. This is in contrast to time-dependent variability (TDV), i.e., differences in the defects of transistors, which is not macroscopically observable immediately after manufacturing. In fact, defects are electrically neutral until they capture a carrier [with mechanisms called bias temperature instability (BTI) and random telegraph noise (RTN)] and thus become observable through their induced degradation. Therefore, transistors which are characterized identically after manufacturing will drift apart during their lifetime, as their susceptibility to effects such as BTI and RTN is different. In this paper, we model for the first time TDV from a defect-centric physical perspective all the way to the circuit level. Our novel defect-centric transistor reliability specification provides a fast, yet accurate method to estimate an upper bound for TDV on the transistor level, while our novel worst cell (WCL) and worst value (WVL) libraries allow for fast evaluation of the impact of TDV on the timing of circuits. Our approach is fully compatible with existing EDA tool flows, allowing us to model and optimize complex circuits like full microprocessors. By evaluating the impact of TDV with our reliability specification and variability-aware cell libraries, we are able to model TDV, which allowed us to reduce the required defect variability guardband by 46%. In addition, we provide design optimization strategies on each abstraction level such as limiting continuous stress, transistor hardening, and implement a novel variability-aware synthesis to achieve up to 57% additional guardband reduction. more...
Published: 2019
Full Text: View/download PDF

47. New Worst-Case Timing for Standard Cells Under Aging Effects

Author: Jorg Henkel, Victor M. van Santen, and Hussam Amrouch
Subjects: 010302 applied physics, Standard cell, Computer science, Signoff, Transistor, Propagation delay, 01 natural sciences, Process corners, Electronic, Optical and Magnetic Materials, law.invention, Control theory, law, 0103 physical sciences, Inverter, Electrical and Electronic Engineering, Safety, Risk, Reliability and Quality, Electronic circuit, Degradation (telecommunications)
Abstract: The design of reliable circuits in current semiconductor technologies requires worst-case estimations of degradation effects during chip signoff. Hence, semiconductor vendors provide worst-case cell delays in the form of slow/slow process corners and best-case cell delays in fast/fast process corners. By providing these corner cases, EDA signoff tools can accurately estimate the circuit timing in which a reliable operation (i.e., no timing violations) is guaranteed for the projected lifetime. State of the art assumes that a standard cell exhibits the worst-case delay increase when all of its transistors uniformly exhibit worst-case aging-induced degradation. As our first contribution, we are the first to demonstrate that this assumption is incorrect and leads to a considerable underestimation of up to 55% in circuit timing. To find the worst-case cell delay, instead of searching across all combinations of non-uniform transistor degradations, we propose reducing the search space by exploiting circuit topology, that is, using cell input vectors to determine transistor duty cycles. Our aim is to find the worst-case input vectors of a cell, which lead to the highest possible shift in rise and fall propagation delay for each standard cell. Since the number of inputs of a standard cell is significantly smaller than its number of transistors, exploring this reduced search space becomes feasible. We show how considering a uniform worst-case degradation for each transistor underestimates the actual degradation in standard cells. In fact, actual non-uniform worst-case inputs vectors result in 83% higher standard cell delay on average (compared to applying peak degradation uniformly) with a peak of $60\boldsymbol \times $ for an inverter under a high load capacitance. more...
Published: 2019
Full Text: View/download PDF

48. Estimating and Mitigating Aging Effects in Routing Network of FPGAs

Author: Jorg Henkel, Behnam Khaleghi, Behzad Omidi, Hussam Amrouch, and Hossein Asadi
Subjects: Interconnection, Computer science, business.industry, 02 engineering and technology, Multiplexer, Multiplexing, 020202 computer hardware & architecture, Reduction (complexity), Hardware and Architecture, Embedded system, Datapath, Hardware_INTEGRATEDCIRCUITS, 0202 electrical engineering, electronic engineering, information engineering, Electrical and Electronic Engineering, Routing (electronic design automation), business, Field-programmable gate array, Software
Abstract: In this paper, we present a comprehensive analysis of the impact of aging on the interconnection network of field-programmable gate arrays (FPGAs) and propose novel approaches to mitigate the aging effects on the routing network. We first show the insignificant impact of aging on data integrity of FPGAs, i.e., static noise margin and soft error rate of the configuration cells, as well as we show the negligible impact of the mentioned degradations on the FPGA performance. As such, we focus on the performance degradation of datapath transistors. In this regard, we propose a routing accompanied by a placement algorithm that prevents constant stress on transistors by evenly distributing the stress through the interconnection resources. By observing the impact of the signal probability on the aging of routing buffers, we enhance the synthesis flow as well as augment the proposed routing algorithm to converge the signal probabilities toward aging-friendly values. Experimental results over a set of industrial benchmarks and commerciallike FPGA architecture indicate the effectiveness of the proposed method with 64.3% reduction of stress duration in multiplexers and up to 45.2% improvement of the degradation of buffers. Altogether, the proposed method reduces the timing guardband by from 14.1% to 31.7%, depending on the FPGA routing architecture. more...
Published: 2019
Full Text: View/download PDF

49. Dynamic Guardband Selection: Thermal-Aware Optimization for Unreliable Multi-Core Systems

Author: Heba Khdr, Jorg Henkel, and Hussam Amrouch
Subjects: Multi-core processor, Computer science, Transistor, Hardware_PERFORMANCEANDRELIABILITY, 02 engineering and technology, 020202 computer hardware & architecture, Theoretical Computer Science, Reliability engineering, law.invention, Threshold voltage, Reliability (semiconductor), Computational Theory and Mathematics, Hardware and Architecture, law, 0202 electrical engineering, electronic engineering, information engineering, Key (cryptography), Software, Selection (genetic algorithm)
Abstract: Circuit aging has become the major reliability concern in current and upcoming technology nodes. For instance, Bias Temperature Instability (BTI) leads to an increase in the threshold voltage of a transistor. That, in turn, may prolong the critical path delay of the processor and eventually may lead to timing errors. In order to avoid aging-induced timing errors, designers employ guardbands either with respect to voltage or frequency. State-of-the-art techniques determine a guardband type at the circuit level at design time irrespective from the running workload at the system level. Our investigation revealed that generated temperatures by a running workload have the potential to play a key role in determining the appropriate guardband type with respect to system performance. Therefore, we propose a paradigm shift in designing guardbands: to select the guardband types on-the-fly with respect to the workload-induced temperatures aiming at optimizing for performance under temperature and reliability constraints. Moreover, different guardband types for different cores can be selected simultaneously when multiple applications with diverse properties suggest this to be useful. Our dynamic guardband selection allows for a higher performance compared to techniques that employ a fixed (at design time) guardband type throughout. more...
Published: 2019
Full Text: View/download PDF

50. Device to Circuit Framework for Activity-Dependent NBTI Aging in Digital Circuits

Author: Subrat Mishra, A. Thirunavukkarasu, Chetan Kumar Dabhi, Nilesh Goel, Hussam Amrouch, Souvik Mahapatra, Yogesh Singh Chauhan, Narendra Parihar, Jorg Henkel, and Jerin Joe
Subjects: 010302 applied physics, Digital electronics, Negative-bias temperature instability, Computer science, business.industry, Dc analysis, Hardware_PERFORMANCEANDRELIABILITY, 01 natural sciences, Signal, Upper and lower bounds, Electronic, Optical and Magnetic Materials, Stress (mechanics), Logic gate, 0103 physical sciences, Hardware_INTEGRATEDCIRCUITS, Electronic engineering, Electrical and Electronic Engineering, business, Hardware_LOGICDESIGN, Degradation (telecommunications)
Abstract: A framework is proposed for activity-dependent timing degradation due to p-FET negative bias temperature instability (NBTI) in digital circuits. A fixed-time compact model is proposed for NBTI and validated with physical model predictions for various digital circuits under different input signal slew and fan-out load conditions. The model is used to predict the timing degradation in digital circuits under arbitrary input activities. An equivalent degradation level is found that can be applied to all p-FETs in the circuit and can serve as an upper bound of degradation due to arbitrary input activity and avoid the conservative worst case dc analysis. The activity dependence is studied in microprocessors as well as arithmetic circuits under different actual workloads. more...
Published: 2019
Full Text: View/download PDF

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Journal

Database

62 results on '"Hussam Amrouch"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources