30 results on '"Avinash Karanth"'
Search Results
2. Ascend: A Scalable and Energy-Efficient Deep Neural Network Accelerator With Photonic Interconnects
- Author
-
Yuan Li, Ke Wang, Hao Zheng, Ahmed Louri, and Avinash Karanth
- Subjects
Hardware and Architecture ,Electrical and Electronic Engineering - Published
- 2022
- Full Text
- View/download PDF
3. Exploiting Wireless Technology for Energy-Efficient Accelerators With Multiple Dataflows and Precision
- Author
-
Siqin Liu, Talha Furkan Canan, Harshavardhan Chenji, Soumyasanta Laha, Savas Kaya, and Avinash Karanth
- Subjects
Hardware and Architecture ,Electrical and Electronic Engineering - Published
- 2022
- Full Text
- View/download PDF
4. DAGGER: Exploiting Language Semantics for Program Security in Embedded Systems
- Author
-
Garret Cunningham, Harsha Chenji, David Juedes, Gordon Stewart, and Avinash Karanth
- Published
- 2023
- Full Text
- View/download PDF
5. Reflections of Cybersecurity Workshop for K-12 Teachers
- Author
-
Chad Mourning, Harsha Chenji, Allyson Hallman-Thrasher, Savas Kaya, Nasseef Abukamail, David Juedes, and Avinash Karanth
- Published
- 2023
- Full Text
- View/download PDF
6. Advanced Machine Learning Techniques to Predict Gvhd Occurrence and Severity with High Accuracy
- Author
-
Hannah Choe, Nicholas Yuhasz, Greer Elizabeth Miller, Me'kayla Travis, Avinash Karanth, Kyle Shifflet, and Parvathi Ranganathan
- Subjects
Immunology ,Cell Biology ,Hematology ,Biochemistry - Published
- 2022
- Full Text
- View/download PDF
7. Hardware-Level Thread Migration to Reduce On-Chip Data Movement Via Reinforcement Learning
- Author
-
Razvan Bunescu, Ahmed Louri, Kyle Shiflett, Quintin Fettes, and Avinash Karanth
- Subjects
Computer science ,Distributed computing ,02 engineering and technology ,Energy consumption ,Thread (computing) ,Computer Graphics and Computer-Aided Design ,Execution time ,020202 computer hardware & architecture ,Instruction set ,Data access ,Scalability ,0202 electrical engineering, electronic engineering, information engineering ,Reinforcement learning ,Electrical and Electronic Engineering ,Latency (engineering) ,Software - Abstract
As the number of processing cores and associated threads in chip multiprocessors (CMPs) continues to scale out, on-chip memory access latency dominates application execution time due to increased data movement. Although tiled CMP architectures with distributed shared caches provide a scalable design, increased physical distance between requesting and responding cores has led to both increased on-chip memory access latency and excess energy consumption. Near data processing is a promising approach that can migrate threads closer to data, however prior hand-engineered rules for fine-grained hardware-level thread migration are either too slow to react to changes in data access patterns, or unable to exploit the large variety of data access patterns. In this article, we propose to use reinforcement learning (RL) to learn relatively complex data access patterns to improve on hardware-level thread migration techniques. By utilizing the recent history of memory access locations as input, each thread learns to recognize the relationship between prior access patterns and future memory access locations. This leads to the unique ability of the proposed technique to make fewer, more effective migrations to intermediate cores that minimize the distance to multiple distinct memory access locations. By allowing a low-overhead RL agent to learn a policy from real interaction with parallel programming benchmarks in a parallel simulator, we show that a migration policy which recognizes more complex data access patterns can be learned. The proposed approach reduces on-chip data movement and energy consumption by an average of 41%, while reducing execution time by 43% when compared to a simple baseline with no thread migration; furthermore, energy consumption and execution time are reduced by an additional 10% when compared to a hand-engineered fine-grained migration policy.
- Published
- 2020
- Full Text
- View/download PDF
8. SPACX: Silicon Photonics-based Scalable Chiplet Accelerator for DNN Inference
- Author
-
Yuan Li, Ahmed Louri, and Avinash Karanth
- Published
- 2022
- Full Text
- View/download PDF
9. Reflections of Cybersecurity Workshop for K-12 Teachers and High School Students
- Author
-
Chad Mourning, David Juedes, Allyson Hallman-Thrasher, Harsha Chenji, Savas Kaya, and Avinash Karanth
- Published
- 2022
- Full Text
- View/download PDF
10. Scaling Deep-Learning Inference with Chiplet-based Architecture and Photonic Interconnects
- Author
-
Ahmed Louri, Yuan Li, and Avinash Karanth
- Subjects
Interconnection ,Multicast ,Computer architecture ,Computer science ,Dataflow ,business.industry ,Deep learning ,Scalability ,Artificial intelligence ,Energy consumption ,business ,Data transmission ,Efficient energy use - Abstract
Chiplet-based architectures have been proposed to scale computing systems for deep neural networks (DNNs). Prior work has shown that for the chiplet-based DNN accelerators, the electrical network connecting the chiplets poses a major challenge to system performance, energy consumption, and scalability. Some emerging interconnect technologies such as silicon photonics can potentially overcome the challenges facing electrical interconnects as photonic interconnects provide high bandwidth density, superior energy efficiency, and ease of implementing broadcast and multicast operations that are prevalent in DNN inference. In this paper, we propose a chiplet-based architecture named SPRINT for DNN inference. SPRINT uses a global buffer to simplify the data transmission between storage and computation, and includes two novel designs: (1) a reconfigurable photonic network that can support diverse communications in DNN inference with minimal implementation cost, and (2) a customized dataflow that exploits the ease of broadcast and multicast feature of photonic interconnects to support highly parallel DNN computations. Simulation studies using ResNet50 DNN model show that SPRINT achieves 46% and 61% execution time and energy consumption reduction, respectively, as compared to other state-of-the-art chiplet-based architectures with electrical or photonic interconnects.
- Published
- 2021
- Full Text
- View/download PDF
11. Dynamic Voltage and Frequency Scaling to Improve Energy-Efficiency of Hardware Accelerators
- Author
-
Siqin Liu and Avinash Karanth
- Published
- 2021
- Full Text
- View/download PDF
12. WiNN: Wireless Interconnect based Neural Network Accelerator
- Author
-
Siqin Liu, Sushanth Karmunchi, Avinash Karanth, Soumyasanta Laha, and Savas Kaya
- Published
- 2021
- Full Text
- View/download PDF
13. Parallel Dot Products Using Silicon Photonics
- Author
-
Avinash Karanth, Kyle Shiflett, and Andy Wolff
- Subjects
Silicon photonics ,Artificial neural network ,business.industry ,Computer science ,Computation ,Deep learning ,Computer Science::Neural and Evolutionary Computation ,Optical computing ,Dot product ,Noise (electronics) ,Computer Science::Hardware Architecture ,Electronic engineering ,Artificial intelligence ,Photonics ,business - Abstract
This paper proposes a parallel photonic architecture for computing dense dot products, such as those found during deep neural network (DNN) inference, and quantifies the architecture's computation error induced by crosstalk and noise.
- Published
- 2021
- Full Text
- View/download PDF
14. Ultracompact and Low-Power Logic Circuits via Workfunction Engineering
- Author
-
Avinash Karanth, Savas Kaya, Ahmed Louri, and Talha Furkan Canan
- Subjects
lcsh:Computer engineering. Computer hardware ,Computer science ,4T AND-OR-invert (AOI) ,6T 4-to-1 multiplexer (MUX) ,NAND gate ,lcsh:TK7885-7895 ,Hardware_PERFORMANCEANDRELIABILITY ,02 engineering and technology ,01 natural sciences ,Multiplexer ,law.invention ,Reduction (complexity) ,law ,0103 physical sciences ,Hardware_INTEGRATEDCIRCUITS ,Electronic engineering ,Hardware_ARITHMETICANDLOGICSTRUCTURES ,Electrical and Electronic Engineering ,010302 applied physics ,nanotechnology ,CMOS logic gates ,Transistor ,3T-XOR ,021001 nanoscience & nanotechnology ,ambipolar ,Electronic, Optical and Magnetic Materials ,CMOS ,Hardware and Architecture ,Logic gate ,0210 nano-technology ,XOR gate ,Hardware_LOGICDESIGN ,NOR gate - Abstract
An extensive analysis of sub-10-nm logic building blocks utilizing ultracompact logic gates based on recently proposed gate workfunction engineering (WFE) approach is provided. WFE sets the WF in the contacts as well as two independent gates of an ambipolar Schottky-barrier (SB) FinFET to alter the threshold of two channels, as a unique leverage to modify the logic functionality out of a single transistor. Thus, a single transistor (1T) CMOS pass-gate, 2T NAND and NOR gates as well as 3T or 4T XOR gates with substantial reduction in overall area (50%) and power (up to $\times 10$ ) dissipation can be implemented. To harness this potential and illustrate the capabilities of these compact ambipolar transistors, novel logic building blocks, including 6T multiplexer, 8T full-adder, 4T latch, 6T D-type flip-flop, and 4T AND-OR-invert (AOI) gates, are developed. Besides the logic verification using 7-nm devices, the dynamic performance of the proposed logic circuits is also analyzed. The comparative simulation study shows that WFE in independent-gate SB-FinFETs can lead to absolutely minimalist CMOS logic blocks without significant degradation to overall power-delay product (PDP) performance.
- Published
- 2019
- Full Text
- View/download PDF
15. Dynamic Voltage and Frequency Scaling in NoCs with Supervised and Reinforcement Learning Techniques
- Author
-
Avinash Karanth, Razvan Bunescu, Ahmed Louri, Quintin Fettes, and Mark Clark
- Subjects
Router ,Optimization problem ,General Computer Science ,Computer science ,Reliability (computer networking) ,02 engineering and technology ,Task (project management) ,Theoretical Computer Science ,law.invention ,law ,0202 electrical engineering, electronic engineering, information engineering ,Reinforcement learning ,Frequency scaling ,Throughput (business) ,Leakage (electronics) ,Multi-core processor ,business.industry ,Transistor ,020202 computer hardware & architecture ,Computational Theory and Mathematics ,Computer engineering ,Hardware and Architecture ,Scalability ,Software engineering ,business ,Software ,Voltage - Abstract
Network-on-Chips (NoCs) are the de facto choice for designing the interconnect fabric in multicore chips due to their regularity, efficiency, simplicity, and scalability. However, NoC suffers from excessive static power and dynamic energy due to transistor leakage current and data movement between the cores and caches. Power consumption issues are only exacerbated by ever decreasing technology sizes. Dynamic Voltage and Frequency Scaling (DVFS) is one technique that seeks to reduce dynamic energy; however this often occurs at the expense of performance. In this paper, we propose LEAD Learning-enabled Energy-Aware Dynamic voltage/frequency scaling for multicore architectures using both supervised learning and reinforcement learning approaches. LEAD groups the router and its outgoing links into the same V/F domain and implements proactive DVFS mode management strategies that rely on offline trained machine learning models in order to provide optimal V/F mode selection between different voltage/frequency pairs. We present three supervised learning versions of LEAD that are based on buffer utilization, change in buffer utilization and change in energy/throughput, which allow proactive mode selection based on accurate prediction of future network parameters. We then describe a reinforcement learning approach to LEAD that optimizes the DVFS mode selection directly, obviating the need for label and threshold engineering. Simulation results using PARSEC and Splash-2 benchmarks on a 4 × 4 concentrated mesh architecture show that by using supervised learning LEAD can achieve an average dynamic energy savings of 15.4 percent for a loss in throughput of 0.8 percent with no significant impact on latency. When reinforcement learning is used, LEAD increases average dynamic energy savings to 20.3 percent at the cost of a 1.5 percent decrease in throughput and a 1.7 percent increase in latency. Overall, the more flexible reinforcement learning approach enables learning an optimal behavior for a wider range of load environments under any desired energy versus throughput tradeoff.
- Published
- 2019
- Full Text
- View/download PDF
16. Bitwise Neural Network Acceleration Using Silicon Photonics
- Author
-
Avinash Karanth, Kyle Shiflett, Razvan Bunescu, and Ahmed Louri
- Subjects
Silicon photonics ,Speedup ,Artificial neural network ,Computer science ,Latency (audio) ,02 engineering and technology ,01 natural sciences ,Matrix multiplication ,010309 optics ,Reduction (complexity) ,020210 optoelectronics & photonics ,Computer engineering ,0103 physical sciences ,0202 electrical engineering, electronic engineering, information engineering ,Bitwise operation ,Efficient energy use - Abstract
Hardware accelerators provide significant speedup and improve energy efficiency for several demanding deep neural network (DNN) applications. DNNs have several hidden layers that perform concurrent matrix-vector multiplications (MVMs) between the network weights and input features. As MVMs are critical to the performance of DNNs, previous research has optimized the performance and energy efficiency of MVMs at both the architecture and algorithm levels. In this paper, we propose to use emerging silicon photonics technology to improve parallelism, speed and overall efficiency with the goal of providing real-time inference and fast training of neural nets. We use microring resonators (MRRs) and Mach-Zehnder interferometers (MZIs) to design two versions (all-optical and partial-optical) of hybrid matrix multiplications for DNNs. Our results indicate that our partial optical design gave the best performance in both energy efficiency and latency, with a reduction of 33.1% for energy-delay product (EDP) with conservative estimates and a 76.4% reduction for EDP with aggressive estimates.
- Published
- 2021
- Full Text
- View/download PDF
17. CSCNN: Algorithm-hardware Co-design for CNN Accelerators using Centrosymmetric Filters
- Author
-
Jiajun Li, Avinash Karanth, Ahmed Louri, and Razvan Bunescu
- Subjects
010302 applied physics ,Finite impulse response ,business.industry ,Computer science ,Computation ,Deep learning ,02 engineering and technology ,01 natural sciences ,Convolutional neural network ,020202 computer hardware & architecture ,Redundancy (information theory) ,0103 physical sciences ,0202 electrical engineering, electronic engineering, information engineering ,Artificial intelligence ,Pruning (decision trees) ,business ,Algorithm ,Computer hardware ,Energy (signal processing) ,Efficient energy use - Abstract
Convolutional neural networks (CNNs) are at the core of many state-of-the-art deep learning models in computer vision, speech, and text processing. Training and deploying such CNN-based architectures usually require a significant amount of computational resources. Sparsity has emerged as an effective compression approach for reducing the amount of data and computation for CNNs. However, sparsity often results in computational irregularity, which prevents accelerators from fully taking advantage of its benefits for performance and energy improvement. In this paper, we propose CSCNN, an algorithm/hardware co-design framework for CNN compression and acceleration that mitigates the effects of computational irregularity and provides better performance and energy efficiency. On the algorithmic side, CSCNN uses centrosymmetric matrices as convolutional filters. In doing so, it reduces the number of required weights by nearly 50% and enables structured computational reuse without compromising regularity and accuracy. Additionally, complementary pruning techniques are leveraged to further reduce computation by a factor of $2.8-7.2\times $ with a marginal accuracy loss. On the hardware side, we propose a CSCNN accelerator that effectively exploits the structured computational reuse enabled by centrosymmetric filters, and further eliminates zero computations for increased performance and energy efficiency. Compared against a dense accelerator, SCNN and SparTen, the proposed accelerator performs $3.7\times $, $1.6\times $ and $1.3\times $ better, and improves the EDP (Energy Delay Product) by $8.9\times $, $2.8\times $ and $2.0\times $, respectively.
- Published
- 2021
- Full Text
- View/download PDF
18. GCNAX: A Flexible and Energy-efficient Accelerator for Graph Convolutional Neural Networks
- Author
-
Razvan Bunescu, Jiajun Li, Ahmed Louri, and Avinash Karanth
- Subjects
010302 applied physics ,Loop (graph theory) ,Speedup ,business.industry ,Computer science ,Dataflow ,Loop fusion ,Deep learning ,Graph theory ,02 engineering and technology ,Parallel computing ,01 natural sciences ,Convolutional neural network ,020202 computer hardware & architecture ,0103 physical sciences ,0202 electrical engineering, electronic engineering, information engineering ,Artificial intelligence ,business ,Throughput (business) - Abstract
Graph convolutional neural networks (GCNs) have emerged as an effective approach to extend deep learning for graph data analytics. Given that graphs are usually irregular, as nodes in a graph may have a varying number of neighbors, processing GCNs efficiently pose a significant challenge on the underlying hardware. Although specialized GCN accelerators have been proposed to deliver better performance over generic processors, prior accelerators not only under-utilize the compute engine, but also impose redundant data accesses that reduce throughput and energy efficiency. Therefore, optimizing the overall flow of data between compute engines and memory, i.e., the GCN dataflow, which maximizes utilization and minimizes data movement is crucial for achieving efficient GCN processing.In this paper, we propose a flexible and optimized dataflow for GCNs that simultaneously improves resource utilization and reduces data movement. This is realized by fully exploring the design space of GCN dataflows and evaluating the number of execution cycles and DRAM accesses through an analysis framework. Unlike prior GCN dataflows, which employ rigid loop orders and loop fusion strategies, the proposed dataflow can reconFigure the loop order and loop fusion strategy to adapt to different GCN configurations, which results in much improved efficiency. We then introduce a novel accelerator architecture called GCNAX, which tailors the compute engine, buffer structure and size based on the proposed dataflow. Evaluated on five real-world graph datasets, our simulation results show that GCNAX reduces DRAM accesses by a factor of $8.1 \times$ and $2.4 \times$, while achieving $8.9 \times, 1.6 \times$ speedup and $9.5 \times$, $2.3 \times$ energy savings on average over HyGCN and AWB-GCN, respectively.
- Published
- 2021
- Full Text
- View/download PDF
19. 4-Input NAND and NOR Gates Based on Two Ambipolar Schottky Barrier FinFETs
- Author
-
Talha Furkan Canan, Savas Kaya, Ahmed Louri, and Avinash Karanth
- Subjects
010302 applied physics ,Materials science ,business.industry ,Ambipolar diffusion ,Schottky barrier ,Transistor ,Semiconductor device modeling ,NAND gate ,Hardware_PERFORMANCEANDRELIABILITY ,02 engineering and technology ,021001 nanoscience & nanotechnology ,01 natural sciences ,law.invention ,law ,Logic gate ,0103 physical sciences ,Hardware_INTEGRATEDCIRCUITS ,Optoelectronics ,0210 nano-technology ,business ,Hardware_LOGICDESIGN ,Voltage ,NOR gate - Abstract
We report on four-input NAND and NOR gates using only two 7nm Schottky-Barrier (SB) independent-gate FinFETs transistors that take advantage of gate workfunction engineering (WFE). Careful optimization of workfunctions at the source/drain contacts as well as two independent gates of the SB-FinFETs provide unprecedented control of the threshold in the ambipolar device operation. It is used in this work to tailor 4-input NAND and NOR functionalities out of only two transistors (2T), utilizing only two different metal workfunctions in a given gate. Correct operation of multi-input gates for supply voltages as low as V DD = 0.5V has been verified using 2D TCAD circuit simulations. Switching performance of the proposed 4-input gates indicate that they have 45% reduction in power-delay product (PDP) as compared to the conventional 16T FinFET counterparts, which is due to substantially lower power dissipation at the expense of slower transitions. A JK Flip-Flop circuit is designed using the proposed four-input NAND gate that illustrates its advantages for the logic operation.
- Published
- 2020
- Full Text
- View/download PDF
20. Energy-Efficient Multiply-and-Accumulate using Silicon Photonics for Deep Neural Networks
- Author
-
Avinash Karanth, Ahmed Louri, Kyle Shiflett, and Razvan Bunescu
- Subjects
Silicon photonics ,020205 medical informatics ,Computer science ,0202 electrical engineering, electronic engineering, information engineering ,Deep neural networks ,02 engineering and technology ,Latency (engineering) ,Efficient energy use ,Computational science - Abstract
We propose two optical hybrid matrix multipliers for deep neural networks. Our results indicate our all-optical design achieved the best performance in energy efficiency and latency, with an energy-delay product reduction of 33.1% and 76.4% for conservative and aggressive estimates, respectively.
- Published
- 2020
- Full Text
- View/download PDF
21. Reconfigurable Gates with Sub-10nm Ambipolar SB-FinFETs for Logic Locking & Obfuscation
- Author
-
Savas Kaya, Talha Furkan Canan, Harsha Chenji, and Avinash Karanth
- Subjects
Computer science ,Transistor ,Reconfigurability ,NAND gate ,Hardware_PERFORMANCEANDRELIABILITY ,02 engineering and technology ,Integrated circuit design ,021001 nanoscience & nanotechnology ,020202 computer hardware & architecture ,law.invention ,Obfuscation (software) ,law ,Logic gate ,Hardware_INTEGRATEDCIRCUITS ,0202 electrical engineering, electronic engineering, information engineering ,Electronic engineering ,Hardware_ARITHMETICANDLOGICSTRUCTURES ,0210 nano-technology ,XOR gate ,Hardware_LOGICDESIGN ,Electronic circuit - Abstract
Examples of compact reconfigurable logic gates utilizing Schottky-barrier (SB) FinFETs that take advantage of workfunction engineering (WFE) are provided. When applied to independent-gate SB-FinFETs, WFE has been shown to be capable of forming minimalist two transistor (2T) NAND,NOR and XOR logic gates that can lower power and area requirements by a factor of (5×) and (2×), respectively, with similar PDP figures as conventional FinFET logic circuits. Here, we introduce a new class of work-function engineered reconfigurable logic gates that can implement multiple logic functionality. Examples of reconfigurable gates include 2T gates that can implement (1-NOR-NOR-NAND) and (0-NAND-NAND-NOR) output functions as well as a 4T gate (1-AB’-A’B-OR) that can provide four distinct functional outputs with only two select bits. TCAD simulations are used to verify logic operation and disclose dynamic performance of the proposed ultra-compact reconfigurable circuits. Optimized for maximum reconfigurability, the PDP figures are higher by an order of magnitude as compared to the static logic circuits built using the same FinFET architecture. The fine-grain and ultra-fast reconfigurability of these circuits would be especially welcomed for logic-locking and obfuscation operations used to enhance hardware security in IC design.
- Published
- 2020
- Full Text
- View/download PDF
22. DozzNoC: Reducing Static and Dynamic Energy in NoCs with Low-latency Voltage Regulators using Machine Learning
- Author
-
Ahmed Louri, Mark Clark, Avinash Karanth, Brian Ma, and Yingping Chen
- Subjects
Power management ,Power gating ,Computer science ,business.industry ,020208 electrical & electronic engineering ,Hardware_PERFORMANCEANDRELIABILITY ,02 engineering and technology ,Voltage regulator ,Inductor ,Machine learning ,computer.software_genre ,020202 computer hardware & architecture ,Dynamic voltage scaling ,Energy conservation ,Network on a chip ,Low-power electronics ,Hardware_INTEGRATEDCIRCUITS ,0202 electrical engineering, electronic engineering, information engineering ,Artificial intelligence ,Latency (engineering) ,Frequency scaling ,business ,computer ,Voltage - Abstract
Network-on-chips (NoCs) continues to be the choice of communication fabric in multicore architectures because the NoC effectively combines the resource efficiency of the bus with the parallelizability of the crossbar. As NoC suffers from both high static and dynamic energy consumption, power-gating and dynamic voltage and frequency scaling (DVFS) have been proposed in the literature to improve energy-efficiency. In this work, we propose DozzNoC, an adaptable power management technique that effectively combines power-gating and DVFS techniques to target both static power and dynamic energy reduction with a single inductor multiple output (SIMO) voltage regulator. The proposed power management design is further enhanced by machine learning techniques that predict future traffic load for proactive DVFS mode selection. DozzNoC utilizes a SIMO voltage regulator scheme that allows for fast, low-powered, and independently power-gated or voltage scaled routers such that each router and its outgoing links share the same voltage/frequency domain. Our simulation results using PARSEC and Splash-2 benchmarks on an 8 × 8 mesh network show that for a decrease of 7% in throughput, we can achieve an average dynamic energy savings of 25% and an average static power reduction of 53%.
- Published
- 2020
- Full Text
- View/download PDF
23. Limit of Hardware Solutions for Self-Protecting Fault-Tolerant NoCs
- Author
-
Jacques H. Collet, Avinash Karanth, and Ahmed Louri
- Subjects
Router ,Triple modular redundancy ,020203 distributed computing ,business.industry ,Computer science ,Reliability (computer networking) ,Fault tolerance ,Hardware_PERFORMANCEANDRELIABILITY ,02 engineering and technology ,Base (topology) ,Chip ,020202 computer hardware & architecture ,Built-in self-test ,Hardware and Architecture ,Limit (music) ,0202 electrical engineering, electronic engineering, information engineering ,Electrical and Electronic Engineering ,business ,Software ,Computer hardware - Abstract
We study the ultimate limits of hardware solutions for the self-protection strategies against permanent faults in networks on chips (NoCs). NoCs reliability is improved by replacing each base router by an augmented router which includes extra protection circuitry. We compare the protection achieved by the self-test and self-protect (STAP) architectures to that of triple modular redundancy with voting (TMR). Two STAP architectures are considered. In the first one, a defective router self-disconnects from the network, while it self-heals in the second one. In practice, none of the considered architectures (STAP or TMR) can tolerate all the permanent faults, especially faults in the extra-circuitry for protection or voting, and consequently, there will always be some unidentified defective augmented routers which are going to transmit errors in an unpredictable manner. This study consists of tackling this fundamental problem. Specifically, we study and determine the average percentage of residual unidentified defective routers (UDRs) and their impact on the overall reliability of the NoC in light of self-protection strategies. Our study shows that TMR is the most efficient solution to limit the average percentage of UDRs when there are typically less than a 0.1 percent of defective base routers. However, TMR is also the most cost prohibitive and the least power efficient. Above 1% of defective base routers, the STAP approaches are more efficient although the protection efficiency decreases inexorably in the very defective technologies (e.g. when there is 10% or more of defective base routers). For instance, if the chip includes 10% of defective base routers, our study shows that there will remain on the average 1% of UDRs, which causes a major challenge for NoC reliability.
- Published
- 2019
- Full Text
- View/download PDF
24. Ambipolar SB-FinFETs: A New Path to Ultra-Compact Sub-10 nm Logic Circuits
- Author
-
Savas Kaya, Talha Furkan Canan, Hao Xin, Ahmed Louri, and Avinash Karanth
- Subjects
Physics ,business.industry ,Ambipolar diffusion ,Transistor ,Gate dielectric ,NAND gate ,Hardware_PERFORMANCEANDRELIABILITY ,Electronic, Optical and Magnetic Materials ,law.invention ,CMOS ,law ,Logic gate ,Hardware_INTEGRATEDCIRCUITS ,Optoelectronics ,Electrical and Electronic Engineering ,business ,AND gate ,Hardware_LOGICDESIGN ,NOR gate - Abstract
Ultracompact sub-10-nm logic gates based on ambipolar characteristics of Schottky-barrier (SB) FinFETs and gate workfunction engineering (WFE) approach are introduced. Novel logic gate designs are proposed using WFE, whereby adjustment of workfunction in the contacts as well as two independently biased FinFET gates leads to an unprecedented degree of freedom for logic functionality that has not been explored before. The use of SB contacts, along with the high-k gate dielectric and ultrathin body, bestows a high-degree of short-channel immunity to the SB-FinFETs with ambipolar current–voltage characteristics down to 5 nm. The unique trait of the proposed novel logic gates is to lower CMOS transistor count by 50% and hence reduce overall area and power dissipation significantly. To illustrate this potential, an entirely novel conjugate (n/p channel) CMOS pass-gate transistor that can function as a two-transistor (2T) XOR and minimalist 2T NAND/NOR gates is designed and verified with TCAD simulations. Depending on the gate designed, TCAD simulations indicate that judicious choice of gate workfunctions between 3.7 and 5.2 eV can lead to CMOS logic gates with a power–delay product (PDP) at ${5}\times {10}^{-{18}}$ J level with immunity to ±0.1-eV workfunction variations. It is shown that WFE in independent-gate SB-FinFETs can lead to ultracompact logic circuits with 50% reduction in area and up to 10 times reduction in power, without significant degradation to overall PDP performance due to slower switching response compared with the conventional designs with p-n junction FinFET counterparts.
- Published
- 2019
- Full Text
- View/download PDF
25. PIXEL: Photonic Neural Network Accelerator
- Author
-
Avinash Karanth, Ahmed Louri, Dylan Wright, and Kyle Shiflett
- Subjects
010302 applied physics ,Silicon photonics ,Pixel ,Artificial neural network ,Contextual image classification ,business.industry ,Computer science ,Static timing analysis ,Optical computing ,02 engineering and technology ,01 natural sciences ,020202 computer hardware & architecture ,Models of neural computation ,0103 physical sciences ,0202 electrical engineering, electronic engineering, information engineering ,Photonics ,business ,Computer hardware - Abstract
Machine learning (ML) architectures such as Deep Neural Networks (DNNs) have achieved unprecedented accuracy on modern applications such as image classification and speech recognition. With power dissipation becoming a major concern in ML architectures, computer architects have focused on designing both energy-efficient hardware platforms as well as optimizing ML algorithms. To dramatically reduce power consumption and increase parallelism in neural network accelerators, disruptive technology such as silicon photonics has been proposed which can improve the performance-per-Watt when compared to electrical implementation. In this paper, we propose PIXEL - Photonic Neural Network Accelerator that efficiently implements the fundamental operation in neural computation, namely the multiply and accumulate (MAC) functionality using photonic components such as microring resonators (MRRs) and Mach-Zehnder interferometer (MZI). We design two versions of PIXEL - a hybrid version that multiplies optically and accumulates electrically and a fully optical version that multiplies and accumulates optically. We perform a detailed power, area and timing analysis of the different versions of photonic and electronic accelerators for different convolution neural networks (AlexNet, VGG16, and others). Our results indicate a significant improvement in the energy-delay product for both PIXEL designs over traditional electrical designs (48.4% for OE and 73.9% for OO) while minimizing latency, at the cost of increased area over electrical designs.
- Published
- 2020
- Full Text
- View/download PDF
26. HREN: A Hybrid Reliable and Energy-Efficient Network-on-Chip Architecture
- Author
-
Padmaja Bhamidipati and Avinash Karanth
- Subjects
Human-Computer Interaction ,Computer Science (miscellaneous) ,Computer Science Applications ,Information Systems - Published
- 2022
- Full Text
- View/download PDF
27. Guest Editors’ Introduction to the Special Issue on Machine Learning Architectures and Accelerators
- Author
-
Avinash Karanth, Xuehai Qian, and Yanzhi Wang
- Subjects
Computer science ,business.industry ,Deep learning ,computer.software_genre ,Machine learning ,Convolutional neural network ,Theoretical Computer Science ,Recurrent neural network ,Software ,Computational Theory and Mathematics ,Hardware and Architecture ,Reinforcement learning ,Code generation ,Compiler ,Artificial intelligence ,business ,computer ,Transformer (machine learning model) - Abstract
The twelve papers in this special section focus on machine learning architectures and accelerators. Deep learning or deep neural networks (DNNs), as one of the most powerful machine learning techniques, has achieved extraordinary performance in computer vision and surveillance, speech recognition and natural language processing, healthcare and disease diagnosis, etc. Various forms of DNNs have been proposed, including Convolutional Neural Networks, Recurrent Neural Networks, Deep Reinforcement Learning, Transformer model, etc. Deep learning exhibits an offline training phase to derive the weight parameters from an excessive training dataset, as well as an online inference phase to perform classification/prediction/perception/ control tasks based on the trained model. The paper in this section aim to find a convergence of software and hardware/architecture. It aims at DNN algorithms, parallel computing, and compiler code generation techniques that are hardware/architecture friendly, as well as computer architectures that are universal and consistently highly performant on a wide range of DNN algorithms and applications. In this co-design and co-optimization framework we can mitigate the limitation of investigating in only a single direction, shedding some light on the future of embedded, ubiquitous artificial intelligence.
- Published
- 2020
- Full Text
- View/download PDF
28. IntelliNoC
- Author
-
Avinash Karanth, Razvan Bunescu, Ahmed Louri, and Ke Wang
- Subjects
010302 applied physics ,Mean time between failures ,Computer science ,Network packet ,business.industry ,02 engineering and technology ,01 natural sciences ,020202 computer hardware & architecture ,Embedded system ,0103 physical sciences ,0202 electrical engineering, electronic engineering, information engineering ,Reinforcement learning ,Holistic design ,Latency (engineering) ,Architecture ,Error detection and correction ,business ,Efficient energy use - Abstract
As technology scales, Network-on-Chips (NoCs), currently being used for on-chip communication in manycore architectures, face several problems including high network latency, excessive power consumption, and low reliability. Simultaneously addressing these problems is proving to be difficult due to the explosion of the design space and the complexity of handling many trade-offs. In this paper, we propose IntelliNoC, an intelligent NoC design framework which introduces architectural innovations and uses reinforcement learning to manage the design complexity and simultaneously optimize performance, energy-efficiency, and reliability in a holistic manner. IntelliNoC integrates three NoC architectural techniques: (1) multifunction adaptive channels (MFACs) to improve energy-efficiency; (2) adaptive error detection/correction and re-transmission control to enhance reliability; and (3) a stress-relaxing bypass feature which dynamically powers off NoC components to prevent overheating and fatigue. To handle the complex dynamic interactions induced by these techniques, we train a dynamic control policy using Q-learning, with the goal of providing improved fault-tolerance and performance while reducing power consumption and area overhead. Simulation using PARSEC benchmarks shows that our proposed IntelliNoC design improves energy-efficiency by 67% and mean-time-to-failure (MTTF) by 77%, and decreases end-to-end packet latency by 32% and area requirements by 25% over baseline NoC architecture.
- Published
- 2019
- Full Text
- View/download PDF
29. High-performance, energy-efficient, fault-tolerant network-on-chip design using reinforcement learning
- Author
-
Ke Wang, Razvan Bunescu, Avinash Karanth, and Ahmed Louri
- Subjects
010302 applied physics ,Computer science ,Network packet ,business.industry ,Retransmission ,Word error rate ,Fault tolerance ,02 engineering and technology ,Chip ,01 natural sciences ,020202 computer hardware & architecture ,Embedded system ,0103 physical sciences ,0202 electrical engineering, electronic engineering, information engineering ,System on a chip ,Error detection and correction ,business ,Decoding methods ,Efficient energy use - Abstract
Network-on-Chips (NoCs) are becoming the standard communication fabric for multi-core and system on a chip (SoC) architectures. As technology continues to scale, transistors and wires on the chip are becoming increasingly vulnerable to various fault mechanisms, especially timing errors, resulting in exacerbation of energy efficiency and performance for NoCs. Typical techniques for handling timing errors are reactive in nature, responding to the faults after their occurrence. They rely on error detection/correction techniques which have resulted in excessive power consumption and degraded performance, since the error detection/correction hardware is constantly enabled. On the other hand, indiscriminately disabling error handling hardware can induce more errors and intrusive retransmission traffic. Therefore, the challenge is to balance the trade-offs among error rate, packet retransmission, performance, and energy. In this paper, we propose a proactive fault-tolerant mechanism to optimize energy efficiency and performance with reinforcement learning (RL). First, we propose a new proactive error handling technique comprised of a dynamic scheme for enabling per-router error detection/correction hardware and an effective retransmission mechanism. Second, we propose the use of RL to train the dynamic control policy with the goals of providing increased fault-tolerance, reduced power consumption and improved performance as compared to conventional techniques. Our evaluation indicates that, on average, end-to-end packet latency is lowered by 55%, energy efficiency is improved by 64%, and retransmission caused by faults is reduced by 48% over the reactive error correction techniques.
- Published
- 2019
- Full Text
- View/download PDF
30. RETUNES: Reliable and Energy-Efficient Network-on-Chip Architecture
- Author
-
Avinash Karanth and Padmaja Bhamidipati
- Subjects
Negative-bias temperature instability ,Computer science ,Transistor ,Hardware_PERFORMANCEANDRELIABILITY ,Threshold voltage ,law.invention ,Network on a chip ,law ,Dynamic demand ,Hardware_INTEGRATEDCIRCUITS ,Electronic engineering ,Network performance ,Frequency scaling ,Hot-carrier injection ,Voltage - Abstract
As the number of cores integrated on the chip increases, the design of reliable and energy-efficient Network-on-Chip (NoC) to support the data movement needed by the multicores is becoming a critical challenge. Reliability of NoC is affected by several aging effects such as Hot carrier injection (HCI) and Negative Bias Temperature Instability (NBTI) which vary the threshold voltage of the transistor, causing timing errors. Dynamic Frequency and Voltage Scaling (DVFS) along with Near Threshold Voltage (NTV) scaling allows the transistor to operate close to the threshold voltage, thereby aggressively minimizing dynamic power consumption by reducing voltage/frequency and minimizing the threshold voltage variation mitigating aging process. However, the trade-off is increased latency and reduced reliability due to lower voltage margins. In this paper, we propose a unified approach called RETUNES: Reliable and Energy-Efficient NoC where NTV scaling and reliability are both achieved while improving performance. RETUNES is a five-level voltage/frequency scaling scheme, which decreases power consumption and threshold voltage variation (?Vth) during low network load with higher reliability and increases the network performance during high network load with reduced reliability. In order to even out the wear out and minimize the impact of aging in NoC, we propose an adaptive routing algorithm in our design. RETUNES improves, total power savings by nearly 2.5x and the energy-delay product (EDP) of the NoC by 3x for Splash-2 and PARSEC benchmarks on a 4 x 4 concentrated mesh architecture.
- Published
- 2018
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.