47 results on '"Cheng, Zhuo"'
Search Results
2. Design of Ultracompact Content Addressable Memory Exploiting 1T-1MTJ Cell
- Author
-
Cheng Zhuo, Zeyu Yang, Kai Ni, Mohsen Imani, Yuxuan Luo, Shaodi Wang, Deming Zhang, and Xunzhao Yin
- Subjects
Electrical and Electronic Engineering ,Computer Graphics and Computer-Aided Design ,Software - Published
- 2023
3. A Fast Method to Estimate Through-Bump Current for Power Delivery Verification
- Author
-
Cheng Zhuo and Songyu Sun
- Subjects
Electrical and Electronic Engineering ,Computer Graphics and Computer-Aided Design ,Software - Published
- 2023
4. Ferroelectric Ternary Content Addressable Memories for Energy-Efficient Associative Search
- Author
-
Xunzhao Yin, Yu Qian, Mohsen Imani, Kai Ni, Chao Li, Grace Li Zhang, Bing Li, Ulf Schlichtmann, and Cheng Zhuo
- Subjects
Electrical and Electronic Engineering ,Computer Graphics and Computer-Aided Design ,Software - Published
- 2023
5. A DVFS Design and Simulation Framework Using Machine Learning Models
- Author
-
Di Gao, Cheng Zhuo, Yuan Cao, Tianhao Shen, Jin-fang Zhou, Xunzhao Yin, and Li Zhang
- Subjects
Power management ,Profiling (computer programming) ,Computer science ,business.industry ,Machine learning ,computer.software_genre ,Power (physics) ,Hardware and Architecture ,Management methods ,Artificial intelligence ,Electrical and Electronic Engineering ,Performance improvement ,business ,computer ,Mobile device ,Design space ,Software - Abstract
Battery-powered mobile devices are typically featured with fast varying workloads and constrained power supply, demanding more efficient run-time power management. In this paper, we propose a DVFS design and simulation framework to explore the design space for DVFS. The framework leverages the existing tool chains and uses several machine learning models to support the architectural simulation and run-time profiling. When using the proposed framework to evaluate different management methods, the proposed deep-reinforcement-learning approach can achieve 5.8-7.3% performance improvement compared with the other alternatives.
- Published
- 2023
6. A fine-grained mixed precision DNN accelerator using a two-stage big–little core RISC-V MCU
- Author
-
Li Zhang, Qishen Lv, Di Gao, Xian Zhou, Wenchao Meng, Qinmin Yang, and Cheng Zhuo
- Subjects
Hardware and Architecture ,Electrical and Electronic Engineering ,Software - Published
- 2023
7. GoodFloorplan: Graph Convolutional Network and Reinforcement Learning-Based Floorplanning
- Author
-
Xiaoqing Wen, Yi Kang, Cheng Zhuo, Hao Geng, Qi Xu, Bo Yuan, and Song Chen
- Subjects
Theoretical computer science ,Optimization problem ,Computer science ,Heuristic ,Reinforcement learning ,Graph (abstract data type) ,Electronic design automation ,Markov decision process ,Electrical and Electronic Engineering ,Physical design ,Computer Graphics and Computer-Aided Design ,Software ,Floorplan - Abstract
Electronic Design Automation (EDA) comprises a series of computationally difficult optimization problems that require substantial specialized knowledge as well as a considerable amount of trial-and-error efforts. However, open challenges including long simulation runtime and lack of generalization continue to restrict the applications of existing EDA tools. Recently, learning-based algorithms especially reinforcement learning (RL) have been successfully applied to handle various combinatorial optimization problems by automatically acquiring knowledge from the past experience. In this paper, we formulate the floorplanning problem, the first stage of the physical design flow, as a Markov Decision Process (MDP). An end-toend learning-based floorplanning framework GoodFloorplan is proposed to explore the design space, which combines graph convolutional network (GCN) and RL. Experimental results demonstrate that, compared with state-of-the-art heuristic-based floorplanners, the proposed GoodFloorplan can provide better area and wirelength.
- Published
- 2022
8. Improving Fault Tolerance for Reliable DNN Using Boundary-Aware Activation
- Author
-
Wei Jiang, Cheng Zhuo, Jinyu Zhan, Xunzhao Yin, Ruoxu Sun, and Yucheng Jiang
- Subjects
Computer science ,Control theory ,Boundary (topology) ,Fault tolerance ,Electrical and Electronic Engineering ,Computer Graphics and Computer-Aided Design ,Software - Published
- 2022
9. PAM: A Piecewise-Linearly-Approximated Floating-Point Multiplier With Unbiasedness and Configurability
- Author
-
Cheng Zhuo, Xunzhao Yin, Chuangtao Chen, Mohsen Imani, and Weikang Qian
- Subjects
Floating point ,Optimization problem ,Computer science ,Topology ,Theoretical Computer Science ,Computational Theory and Mathematics ,Hardware and Architecture ,Piecewise ,Multiplier (economics) ,Multiplication ,Enhanced Data Rates for GSM Evolution ,Linear independence ,Hardware_ARITHMETICANDLOGICSTRUCTURES ,Software ,Efficient energy use - Abstract
Approximate computing is a promising alternative to improve energy efficiency for IoT devices on the edge. This work proposes a piecewise-linearly-approximated and unbiased floating-point approximate multiplier with run-time configurability. We provide a theoretically sound formulation that turns multiplication approximation to an optimization problem. With the formulation and findings, a multi-level architecture is proposed to easily incorporate run-time configurability and module execution parallelism. Finally, the proposed multiplier is further optimized to reduce the circuit implementation complexity, making the multiplier linearly dependent on the precision requirement, instead of quadratically or exponentially as in prior work. When compared to the prior state-of-the-art approximate floating-point multiplier, ApproxLP [1], the proposed multiplier outperforms in all the aspects including accuracy, area, and delay. By replacing a full-precision floating-point multiplier in GPU, the proposed design can improve the energy efficiency for various edge computing tasks. Even with Level 1 approximation, the proposed multiplier improves energy efficiency up to 20x for machine learning on CIFAR-10, with almost negligible accuracy loss.
- Published
- 2022
10. Computing-In-Memory Using Ferroelectrics: From Single- to Multi-Input Logic
- Author
-
Di Gao, Michael Niemier, Cheng Zhuo, Chao Li, Dayane Reis, Mohsen Imani, Qingrong Huang, Xunzhao Yin, and Xiaobo Sharon Hu
- Subjects
Out of memory ,CMOS ,Computer engineering ,Hardware and Architecture ,Computer science ,Computation ,Process (computing) ,Electrical and Electronic Engineering ,Performance improvement ,Software ,Energy (signal processing) ,Efficient energy use ,Data transmission - Abstract
As a promising solution to address memory wall, computing in memory (CiM) designs using CMOS or emerging non-volatile memories (NVMs) are proposed to support bit-wise Boolean logic and arithmetic operations. Unlike the conventional executions that read data out of memory, process sequentially and send the outputs back, CiM implementations fully utilize its parallel in-memory bit-wise computation nature and hence reduce the unnecessary data transfer, thereby saving both area and power consumption. This paper proposes CiM designs based on Ferroelectric FETs (FeFETs) for bit-wise Boolean logic and multi-input MAJ operations. By exploiting the merged memory and switch property of FeFETs, our design realizes both high storage density and energy efficiency in computation. Evaluation results demonstrate that our proposed CiM designs can achieve significant energy saving and performance improvement, up to ~31% for Boolean logic and ~11× for majority vote, when compared to other NVM counterparts.
- Published
- 2022
11. ANT-UNet: Accurate and Noise-Tolerant Segmentation for Pathology Image Processing
- Author
-
Yufei Chen, Tingtao Li, Qinming Zhang, Wei Mao, Nan Guan, Mei Tian, Hao Yu, and Cheng Zhuo
- Subjects
Hardware and Architecture ,Electrical and Electronic Engineering ,Software - Abstract
Pathology image segmentation is an essential step in early detection and diagnosis for various diseases. Due to its complex nature, precise segmentation is not a trivial task. Recently, deep learning has been proved as an effective option for pathology image processing. However, its efficiency is highly restricted by inconsistent annotation quality. In this article, we propose an accurate and noise-tolerant segmentation approach to overcome the aforementioned issues. This approach consists of two main parts: a preprocessing module for data augmentation and a new neural network architecture, ANT-UNet. Experimental results demonstrate that, even on a noisy dataset, the proposed approach can achieve more accurate segmentation with 6% to 35% accuracy improvement versus other commonly used segmentation methods. In addition, the proposed architecture is hardware friendly, which can reduce the amount of parameters to one-tenth of the original and achieve 1.7× speed-up.
- Published
- 2021
12. Attention cutting and padding learning for fine-grained image recognition
- Author
-
Hao Luo, Hongjian Li, Duan Xiaolin, He Mingxuan, Xiangyan Zeng, and Cheng Zhuo
- Subjects
Computer Networks and Communications ,Computer science ,business.industry ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Semantics ,Padding ,Convolutional neural network ,Field (computer science) ,Image (mathematics) ,Task (computing) ,Hardware and Architecture ,Media Technology ,Computer vision ,Saliency map ,Artificial intelligence ,business ,Intelligent transportation system ,Software - Abstract
Fine-grained image recognition is an important task in the field of computer vision. In fine-grained image recognition, the difference between different categories is very small. Thus, fine-grained image recognition highly depends on local features. In this paper, a novel “Attention Cutting And Padding Learning” method is proposed to learn the local features. Firstly, the image is fed to Convolutional Neural Networks, and a saliency map is gotten. According to the saliency map, the attention image is obtained. Secondly, the attention image is cut into $$N*N$$ sub-images. Every sub-image is padded by 0 and the padding size is P. All sub-images are spliced into a Cutting And Padding image. Finally, the Cutting And Padding image and the attention image are fed to CNNs to train. In this method, more local features can be learned, and the high-level semantics is not damaged. Experimental results show that the recognition accuracy of Attention Cutting And Padding Learning is 87.9%, 94.6%, and 92.4% respectively on CUB-200-2011, Stanford Cars, and FGVC-Aircraft dataset. Moreover, this method can be easily applied to biodiversity automatic monitoring, intelligent retail, intelligent transportation, and other fields to improve recognition accuracy without changing the network structure.
- Published
- 2021
13. A Reconfigurable Multiplier for Signed Multiplications with Asymmetric Bit-Widths
- Author
-
Bing Li, Xunzhao Yin, Grace Li Zhang, Cheng Zhuo, Chuliang Guo, Li Zhang, Weikang Qian, and Xian Zhou
- Subjects
Approximate computing ,Computer science ,020208 electrical & electronic engineering ,02 engineering and technology ,020202 computer hardware & architecture ,Embedded applications ,Bit (horse) ,Hardware and Architecture ,0202 electrical engineering, electronic engineering, information engineering ,Multiplier (economics) ,Hardware_ARITHMETICANDLOGICSTRUCTURES ,Electrical and Electronic Engineering ,Arithmetic ,Software - Abstract
Multiplications have been commonly conducted in quantized CNNs, filters, and reconfigurable cores, and so on, which are widely deployed in mobile and embedded applications. Most multipliers are designed to perform multiplications with symmetric bit-widths, i.e., n - by n -bit multiplication. Such features would cause extra area overhead and performance loss when m - by n -bit multiplications ( m > n ) are deployed in the same hardware design, resulting in inefficient multiplication operations. It is highly desired and challenging to propose a reconfigurable multiplier design to accommodate operands with both symmetric and asymmetric bit-widths. In this work, we propose a reconfigurable approximate multiplier to support multiplications at various precisions, i.e., bit-widths. Unlike prior works of approximate adders assuming a uniform weight distribution with bit-wise independence, scenarios like a quantized CNN may have a centralized weight distribution and hence follow a Gaussian-like distribution with correlated adjacent bits. Thus, a new block-based approximate adder is also proposed as part of the multiplier to ensure energy-efficient operation with an awareness of the bit-wise correlation. Our experimental results show that the proposed approximate adder significantly reduces the error rate by 76% to 98% over a state-of-the-art approximate adder for Gaussian-like distribution scenarios. Evaluation results show that the proposed multiplier is 19% faster and 22% more power saving than a Xilinx multiplier IP at the same bit precision and achieves a 23.94-dB peak signal-to-noise ratio, which is comparable to the accurate one of 24.10 dB when deployed in a Gaussian filter for image processing tasks.
- Published
- 2021
14. Energy-Efficient Real-Time UAV Object Detection on Embedded Platforms
- Author
-
Zhiguo Shi, Jianing Deng, and Cheng Zhuo
- Subjects
Computer science ,Detector ,Feature extraction ,Real-time computing ,02 engineering and technology ,Energy consumption ,Computer Graphics and Computer-Aided Design ,Object detection ,020202 computer hardware & architecture ,Task (computing) ,0202 electrical engineering, electronic engineering, information engineering ,Task analysis ,Key (cryptography) ,Electrical and Electronic Engineering ,Software ,Efficient energy use - Abstract
The recent technology advancement on unmanned aerial vehicle (UAV) has enabled diverse applications in vision-related outdoor tasks. Visual object detection is a crucial task among them. However, it is difficult to actually deploy detectors on embedded devices due to the challenges among energy consumption, accuracy, and speed. In this article, we address a few key challenges from the platform, application to the system, and propose an energy-efficient system for real-time UAV object detection on an embedded platform. The proposed system can achieve speed of 28.5 FPS and 2.7-FPS/W energy efficiency on the data set from 2018 low-power object detection challenges (LPODCs).
- Published
- 2020
15. Hierarchical saliency mapping for weakly supervised object localization based on class activation mapping
- Author
-
Hongjian Li, Cheng Zhuo, Duan Xiaolin, Meiqi Wang, and Xiangyan Zeng
- Subjects
Computer Networks and Communications ,Computer science ,business.industry ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,020207 software engineering ,Pattern recognition ,02 engineering and technology ,Object (computer science) ,Class (biology) ,Field (computer science) ,Hardware and Architecture ,Salience (neuroscience) ,Video tracking ,0202 electrical engineering, electronic engineering, information engineering ,Media Technology ,Saliency map ,Artificial intelligence ,Pyramid (image processing) ,business ,Software - Abstract
Weakly supervised object localization is a basic research in the field of computer vision. In this paper, a hierarchical saliency mapping network for object localization is proposed and designed to avoid missing detailed information of potential object. Based on the classical convolution network, we remove the fully connected part and add multiple information extraction branches. The network extracts information from convolution layers of different scales to generate Hierarchical Saliency Map. Hierarchical Saliency Maps that include Hierarchical-Class Activation Map and Hierarchical-Spatial Pyramid Saliency Map fuse deep-level features and low-level features to locate object. The datasets used for testing are Caltech-UCSD Birds 200, Caltech101 and ImageNet. Compared with Class Activation Map and Spatial Pyramid Saliency Map, the localization accuracy has been improved. This method can be used for fine-grained classification, object tracking and other fields.
- Published
- 2020
16. Noise-Aware DVFS for Efficient Transitions on Battery-Powered IoT Devices
- Author
-
Shaoheng Luo, Jiang Hu, Houle Gan, Cheng Zhuo, and Zhiguo Shi
- Subjects
Schedule ,business.industry ,Computer science ,Clock rate ,Electrical engineering ,02 engineering and technology ,Computer Graphics and Computer-Aided Design ,Capacitance ,020202 computer hardware & architecture ,0202 electrical engineering, electronic engineering, information engineering ,Electrical and Electronic Engineering ,Internet of Things ,business ,Software ,Decoupling (electronics) - Abstract
Low power system-on-chips (SoCs) are now at the heart of Internet-of-Things (IoT) devices, which are well-known for their bursty workloads and limited energy storage—usually in the form of tiny batteries. To ensure battery lifetime, dynamic voltage frequency scaling (DVFS) has become an essential technique in such SoC chips. With continuously decreasing supply level, noise margins in these devices are already being squeezed. During DVFS transition, large current that accompanies the clock speed transition runs into or out of clock networks in a few clock cycles, induces large ${\text {L}di}{/}{\mathrm {d}t}$ noise, thereby stressing the power delivery system (PDS). Due to the limited area and cost target, adding additional decoupling capacitance to mitigate such noise is usually challenging. A common approach is to gradually introduce/remove the additional clock cycles to increase/decrease the clock frequency in steps, also known as, clock skipping. However, such a technique may increase DVFS transition time, and still cannot guarantee minimal noise. In this paper, we propose a new noise-aware DVFS sequence optimization technique by formulating a mixed 0/1 programming to resolve the problems of clock skipping sequence optimization. Moreover, the method is also extended to schedule extensive wake-up activities on different clock domains for the same purpose. The experiments show that the optimized sequence is able to significantly mitigate noise within the desired transition time, thereby saving both power and energy.
- Published
- 2020
17. Dynamic Frequency Scaling Aware Opportunistic Through-Silicon-Via Inductor Utilization in Resonant Clocking
- Author
-
Cheng Zhuo, Leibo Liu, Yiyu Shi, and Umamaheswara Rao Tida
- Subjects
Through-silicon via ,Maximum power principle ,Computer science ,Dynamic frequency scaling ,Integrated circuit ,Inductor ,Computer Graphics and Computer-Aided Design ,Noise (electronics) ,law.invention ,Inductance ,Capacitor ,law ,Electronic engineering ,Electrical and Electronic Engineering ,Software - Abstract
LC resonant clock is a viable option for low power on-chip clock distributions. A major limiting factor to its implementation is the large area overhead due to the use of conventional spiral inductors. On the other hand, idle through-silicon-vias (TSVs) in 3-D integrated circuits (3-D ICs) can form vertical inductors with minimal footprint and have little noise coupling with horizontal traces, particularly suitable for the application of LC resonant clock. However, due to the strict constraints on the location of idle TSVs, the use of the TSV inductor is constrained by its location, inductance, and quality factor. The problem is further complicated by dynamic frequency scaling (DFS), where the resonant tanks need to accommodate multiple clock frequencies. Moreover, these TSV inductors can be in any orientation with any distance apart, thereby causing complicated coupling effects. In this paper, we first present a novel scheme to opportunistically use idle TSVs to form inductors in LC resonant clock of 3-D ICs for maximum power reduction in clock-distribution network (CDN) at a fixed frequency, and then extend it to DFS schemes. Experimental results on a few industrial designs for the resonant CDNs operated at a fixed frequency of 3 GHz show that the power consumption is reduced by up to 47.9% compared with the conventional CDNs without resonant clocking. In addition, for the resonant CDNs with DFS scheme, the power consumption reduced by up to 42.3%, 39.0%, 38.3%, 34.3%, and 28.6% at 3, 2.5, 2, 1.5, and 1 GHz frequency, respectively, compared with the CDNs without resonant clocking. When compared with CDNs with conventional spiral inductors, our scheme with TSV inductors can reduce the inductor footprint by up to $6.30 \times$ with the same power consumption.
- Published
- 2020
18. VirtualSync+: Timing Optimization with Virtual Synchronization
- Author
-
Grace Li Zhang, Bing Li, Xing Huang, Xunzhao Yin, Cheng Zhuo, Masanori Hashimoto, and Ulf Schlichtmann
- Subjects
FOS: Computer and information sciences ,Hardware Architecture (cs.AR) ,Electrical and Electronic Engineering ,Computer Science - Hardware Architecture ,Computer Graphics and Computer-Aided Design ,Software ,Hardware_LOGICDESIGN - Abstract
In digital circuit designs, sequential components such as flip-flops are used to synchronize signal propagations. Logic computations are aligned at and thus isolated by flip-flop stages. Although this fully synchronous style can reduce design efforts significantly, it may affect circuit performance negatively, because sequential components can only introduce delays into signal propagations but never accelerate them. In this paper, we propose a new timing model, VirtualSync+, in which signals, specially those along critical paths, are allowed to propagate through several sequential stages without flip-flops. Timing constraints are still satisfied at the boundary of the optimized circuit to maintain a consistent interface with existing designs. By removing clock-to-q delays and setup time requirements of flip-flops on critical paths, the performance of a circuit can be pushed even beyond the limit of traditional sequential designs. In addition, we further enhance the optimization with VirtualSync+ by fine-tuning with commercial design tools, e.g., Design Compiler from Synopsys, to achieve more accurate result. Experimental results demonstrate that circuit performance can be improved by up to 4% (average 1.5%) compared with that after extreme retiming and sizing, while the increase of area is still negligible. This timing performance is enhanced beyond the limit of traditional sequential designs. It also demonstrates that compared with those after retiming and sizing, the circuits with VirtualSync+ can achieve better timing performance under the same area cost or smaller area cost under the same clock period, respectively.
- Published
- 2022
- Full Text
- View/download PDF
19. FeFET Based In-Memory Hyperdimensional Encoding Design
- Author
-
Qingrong Huang, Zeyu Yang, Kai Ni, Mohsen Imani, Cheng Zhuo, and Xunzhao Yin
- Subjects
Electrical and Electronic Engineering ,Computer Graphics and Computer-Aided Design ,Software - Published
- 2023
20. Partial Unbalanced Feature Transport for Cross-Modality Cardiac Image Segmentation
- Author
-
Shunjie Dong, Zixuan Pan, Yu Fu, Dongwei Xu, Kuangyu Shi, Qianqian Yang, Yiyu Shi, and Cheng Zhuo
- Subjects
Radiological and Ultrasound Technology ,Electrical and Electronic Engineering ,Software ,Computer Science Applications - Published
- 2023
21. Optimal design of a low-power, phase-switching modulator for implantable medical applications
- Author
-
Yiyu Shi, Dawei Li, Cheng Zhuo, Xiaowei Xu, Leibo Liu, and Li Zhang
- Subjects
Computer science ,020208 electrical & electronic engineering ,Transmitter ,Hardware_PERFORMANCEANDRELIABILITY ,02 engineering and technology ,Chip ,020202 computer hardware & architecture ,Power (physics) ,law.invention ,Capacitor ,Transmission (telecommunications) ,Hardware and Architecture ,Modulation ,Filter (video) ,law ,Hardware_INTEGRATEDCIRCUITS ,0202 electrical engineering, electronic engineering, information engineering ,Electronic engineering ,Electrical and Electronic Engineering ,Resistor ,Software - Abstract
Since first proposed in JSSC [1], phase-switching modulation has been proved to be a viable transmission scheme. In this paper, we explored the maximum data-rate of phase-switching modulation and proposed a modified quadrature oscillator with a self-bias circuit to isolate the disturbance in power supply and ground. Dividers and filter are also optimized to minimize parasitic resistor and capacitor. The prototype is fabricated in a 0.18 μm CMOS process, with an active chip area of about 0.13 mm × 0.35 mm. The total power consumption of the modulator is only 2 mW with an energy efficiency of 80 pJ/Bit. This modulator has been successfully embedded in a direct-up conversion transmitter for medical application.
- Published
- 2019
22. A Cross-Layer Framework for Temporal Power and Supply Noise Prediction
- Author
-
Cheng Zhuo, Yaguang Li, and Pingqiang Zhou
- Subjects
Noise margin ,Computer science ,Robustness (computer science) ,0202 electrical engineering, electronic engineering, information engineering ,02 engineering and technology ,Autoregressive integrated moving average ,Electrical and Electronic Engineering ,Chip ,Computer Graphics and Computer-Aided Design ,Moving-average model ,Software ,Simulation ,020202 computer hardware & architecture - Abstract
In modern microprocessor and SoC designs, supply noise margin has been significantly reduced due to the continuously decreasing supply voltage level. On the other hand, with increasing current density, chips may see larger supply noise variations on various spots and from time to time. As a result, chip robustness and reliability are inevitably deteriorated with more frequent supply noise emergencies. It is therefore crucial to have an efficient supply noise prediction method to enhance design robustness. The state-of-art solutions either try to build a spatial noise estimation framework at the layout-level using the limited distributed physical noise sensors or attempt to develop emergency predictors at the architecture-level thus ignore back-end power delivery details. In this paper, we propose a cross-layer framework for temporal supply noise prediction. Our method not only accounts for the temporal characteristics of workload execution at micro-architecture-level but also incorporates the power delivery model at the circuit-level into such system-level prediction. In order to enable the capability of on-the-fly noise prediction, we first bridge the gap between system-level workload and micro-architectural-level power by employing an ordinary least square-based power estimation model and an adaptive auto-regressive integrated moving average model (ARIMA)-based power prediction model. Then a layout-level supply noise model is developed to explore the correlations between micro-architectural-level power and layout-level supply noise. Compared with existing methods, the proposed ARIMA-based power model improves the prediction performance by up to 37.5%/63.0% in X86/ARM. Moreover, compared with SPICE simulation, our framework is able to estimate present supply noise with an average error of 0.005% and predict future supply noise with an average error of 1.58%/1.17% for X86/ARM architecture.
- Published
- 2019
23. Single-Inductor–Multiple-Tier Regulation: TSV-Inductor-Based On-Chip Buck Converters for 3-D IC Power Delivery
- Author
-
Umamaheswara Rao Tida, Cheng Zhuo, and Yiyu Shi
- Subjects
Buck converter ,business.industry ,Computer science ,Capacitive sensing ,Electrical engineering ,Hardware_PERFORMANCEANDRELIABILITY ,02 engineering and technology ,Integrated circuit ,Inductor ,020202 computer hardware & architecture ,law.invention ,Power (physics) ,Hardware_GENERAL ,Hardware and Architecture ,law ,Hardware_INTEGRATEDCIRCUITS ,0202 electrical engineering, electronic engineering, information engineering ,Electrical and Electronic Engineering ,business ,Software ,Voltage - Abstract
On-chip inductive buck converters gain popularity due to their higher efficiency at higher load currents compared to its linear and capacitive counterparts. Through-silicon-via inductors (TSV-Inductors) in 3-D integrated circuit (3-D IC) technology can be used for the buck converter implementation that reduces the metal resource consumption of the inductor. However, in 3-D ICs, the regulated voltage from buck converters might be required for multiple tiers. Simply designing one buck converter per tier is apparently resource consuming. This paper fully utilizes the feature of TSV-Inductor and temporal/spatial sharing techniques to enable single-inductor–multiple-tier regulation for 3-D ICs. Experimental results suggest that under the same design specifications and resource consumption, the TSV-Inductor-based time multiplexing buck converter (TMBC) and the shared inductor buck converter (SIBC) help increase the efficiency by up to 15% and 25%, respectively, compared with the conventional power delivery scheme using one TSV-Inductor-based buck converter per tier. Moreover, the ripples of the TSV-Inductor-based TMBC and SIBC can be reduced by up to $3\times $ and $6\times $ , respectively. To the best of our knowledge, this is the very first work exploring buck converter sharing between multiple tiers in 3-D ICs.
- Published
- 2019
24. From Layout to System: Early Stage Power Delivery and Architecture Co-Exploration
- Author
-
Cheng Zhuo, Kassan Unda, Yiyu Shi, and Wei-Kai Shih
- Subjects
Computer science ,Reliability (computer networking) ,Oxide ,Power integrity ,02 engineering and technology ,Static analysis ,Computer Graphics and Computer-Aided Design ,020202 computer hardware & architecture ,Reliability engineering ,Power (physics) ,Noise margin ,chemistry.chemical_compound ,chemistry ,0202 electrical engineering, electronic engineering, information engineering ,Systems architecture ,Electrical and Electronic Engineering ,Architecture ,Software - Abstract
With the reduced noise margin brought by relentless technology scaling, power integrity assurance has become more challenging than ever. On the other hand, traditional design methodologies typically focus on a single design layer without much cross-layer interaction, potentially introducing unnecessary guard-band and wasting significant design resources. Both issues imperatively call for a cross-layer framework for the co-exploration of power delivery (PD) and system architecture, especially in the early design stage with larger design and optimization freedom. Unfortunately, such a framework does not exist yet in the literature. As a step forward, this paper provides a run-time simulation framework of both PD and architecture and captures their interactions. Enabled by the proposed recursive run-time PD model, it can achieve smaller than 1% deviation from SPICE for an entire PD system simulation. Moreover, with seamless interactions among architecture, power and PD simulators, it can simulate actual benchmarks within reasonable time. The experimental results of running PARSEC suite have demonstrated the framework’s capability to discover the co-effect of PD and architecture for early stage design optimization. Moreover, it also shows multiple over-pessimism in traditional PD methodologies. Finally, the framework is able to investigate the impact of dynamic noise on system level oxide breakdown reliability and shows 31%–92% lifetime estimation deviations from typical static analysis.
- Published
- 2019
25. Run-time demand estimation and modulation of on-chip decaps at system level for leakage power reduction in multicore chips
- Author
-
Cheng Zhuo, Leilei Wang, and Pingqiang Zhou
- Subjects
Multi-core processor ,Computer science ,Computation ,020208 electrical & electronic engineering ,Demand estimation ,Hardware_PERFORMANCEANDRELIABILITY ,02 engineering and technology ,Leakage power ,Chip ,Capacitance ,020202 computer hardware & architecture ,Hardware and Architecture ,Hardware_INTEGRATEDCIRCUITS ,0202 electrical engineering, electronic engineering, information engineering ,Electronic engineering ,System level ,Electrical and Electronic Engineering ,Software ,Leakage (electronics) - Abstract
The leakage power of decaps occupies a large portion of total chip leakage power. In this paper we propose an approximate approach to estimate the amount of the required “on” capacitance of each decap at runtime to achieve runtime decap modulation in multicore chips, and further develop two techniques (incremental calculation and sparsification) to improve the approximate approach. Results on a set of benchmarks show that our approach can achieve about 45% saving in decap leakage on average, and the approximate approach can further reduce the computation cost by up to 22× with accuracy loss of less than 1%.
- Published
- 2019
26. System-level design consideration and optimization of through-silicon-via inductor
- Author
-
Baixin Chen and Cheng Zhuo
- Subjects
010302 applied physics ,Electronic system-level design and verification ,Engineering ,Through-silicon via ,business.industry ,Electrical engineering ,Three-dimensional integrated circuit ,Hardware_PERFORMANCEANDRELIABILITY ,02 engineering and technology ,Voltage regulator ,Inductor ,01 natural sciences ,020202 computer hardware & architecture ,Design for manufacturability ,Inductance ,Hardware_GENERAL ,Hardware and Architecture ,0103 physical sciences ,Hardware_INTEGRATEDCIRCUITS ,0202 electrical engineering, electronic engineering, information engineering ,Electronic engineering ,Electrical and Electronic Engineering ,Routing (electronic design automation) ,business ,Software - Abstract
Due to the slow scaling of board and package technology, on-chip inductor has shown promising potential to enable more compact design and smaller parasitics for inductor-based designs, such as voltage regulator, resonant clocking, filter, etc. On the other hand, conventional on-chip 2D spiral inductor must be placed on the top metal layers, thereby consuming significant routing resources for global interconnects. Moreover, it may need more dedicated shielding to prevent unnecessary coupling, which further increases its occupied area. With the popularity of 2.5D and 3D chip architecture, Through-Silicon-Via (TSV) has been widely used, a significant portion of which are placed for thermal/manufacturability/reliability purposes. Thus, those redundant TSVs can be utilized to form the on-chip inductor for 2.5D/3D chips, with lower footprint and higher inductance density compared from the conventional spiral inductor. Unlike prior works focusing on the inductor itself, this paper discusses the optimization and application of such TSV-inductor from system perspective, including the optimization options and its design considerations. The possible design options including physical parameters, architecture and materials, to optimize the TSV-inductor are thoroughly investigated. Based on that, we further study a few key design scenarios to evaluate the design impact with use of such TSV-inductor and provide the design guidelines for its application in actual system designs.
- Published
- 2019
27. RCoNet: Deformable Mutual Information Maximization and High-order Uncertainty-aware Learning for Robust COVID-19 Detection
- Author
-
Yu Fu, Cheng Zhuo, Qianqian Yang, Shunjie Dong, and Mei Tian
- Subjects
FOS: Computer and information sciences ,Computer Networks and Communications ,Computer science ,Computer Vision and Pattern Recognition (cs.CV) ,Feature extraction ,Computer Science - Computer Vision and Pattern Recognition ,Expert Systems ,Diagnosis, Differential ,Deep Learning ,Discriminative model ,Artificial Intelligence ,FOS: Electrical engineering, electronic engineering, information engineering ,Humans ,business.industry ,Deep learning ,Image and Video Processing (eess.IV) ,Uncertainty ,COVID-19 ,Reproducibility of Results ,Pattern recognition ,Pneumonia ,Mutual information ,Maximization ,Thorax ,Electrical Engineering and Systems Science - Image and Video Processing ,Computer Science Applications ,Feature (computer vision) ,Neural Networks, Computer ,Artificial intelligence ,Noise (video) ,Tomography, X-Ray Computed ,business ,Focus (optics) ,Algorithms ,Software ,Information Systems - Abstract
The novel 2019 Coronavirus (COVID-19) infection has spread worldwide and is currently a major healthcare challenge around the world. Chest computed tomography (CT) and X-ray images have been well recognized to be two effective techniques for clinical COVID-19 disease diagnoses. Due to faster imaging time and considerably lower cost than CT, detecting COVID-19 in chest X-ray (CXR) images is preferred for efficient diagnosis, assessment, and treatment. However, considering the similarity between COVID-19 and pneumonia, CXR samples with deep features distributed near category boundaries are easily misclassified by the hyperplanes learned from limited training data. Moreover, most existing approaches for COVID-19 detection focus on the accuracy of prediction and overlook uncertainty estimation, which is particularly important when dealing with noisy datasets. To alleviate these concerns, we propose a novel deep network named RCoNet $^{k}_{s}$ for robust COVID-19 detection which employs Deformable Mutual Information Maximization (DeIM), Mixed High-order Moment Feature (MHMF), and Multiexpert Uncertainty-aware Learning (MUL). With DeIM, the mutual information (MI) between input data and the corresponding latent representations can be well estimated and maximized to capture compact and disentangled representational characteristics. Meanwhile, MHMF can fully explore the benefits of using high-order statistics and extract discriminative features of complex distributions in medical imaging. Finally, MUL creates multiple parallel dropout networks for each CXR image to evaluate uncertainty and thus prevent performance degradation caused by the noise in the data. The experimental results show that RCoNet $^{k}_{s}$ achieves the state-of-the-art performance on an open-source COVIDx dataset of 15 134 original CXR images across several metrics. Crucially, our method is shown to be more effective than existing methods with the presence of noise in the data.
- Published
- 2021
28. Robustness of Neuromorphic Computing with RRAM-based Crossbars and Optical Neural Networks
- Author
-
Grace Li Zhang, Huaxi Gu, Ying Zhu, Bing Li, Yiyu Shi, Cheng Zhuo, Xunzhao Yin, Tianchen Wang, Tsung-Yi Ho, and Ulf Schlichtmann
- Subjects
010302 applied physics ,Artificial neural network ,Noise (signal processing) ,Computer science ,business.industry ,Process (computing) ,Inference ,02 engineering and technology ,Residual ,01 natural sciences ,020202 computer hardware & architecture ,Software ,Neuromorphic engineering ,Computer engineering ,Robustness (computer science) ,0103 physical sciences ,0202 electrical engineering, electronic engineering, information engineering ,business - Abstract
RRAM-based crossbars and optical neural networks are attractive platforms to accelerate neuromorphic computing. However, both accelerators suffer from hardware uncertainties such as process variations. These uncertainty issues left unaddressed, the inference accuracy of these computing platforms can degrade significantly. In this paper, a statistical training method where weights under process variations and noise are modeled as statistical random variables is presented. To incorporate these statistical weights into training, the computations in neural networks are modified accordingly. For optical neural networks, we modify the cost function during software training to reduce the effects of process variations and thermal imbalance. In addition, the residual effects of process variations are extracted and calibrated in hardware test, and thermal variations on devices are also compensated in advance. Simulation results demonstrate that the inference accuracy can be improved significantly under hardware uncertainties for both platforms.
- Published
- 2021
29. BRoCoM: A Bayesian Framework for Robust Computing on Memristor Crossbar
- Author
-
Di Gao, Zeyu Yang, Qingrong Huang, Grace Li Zhang, Xunzhao Yin, Bing Li, Ulf Schlichtmann, and Cheng Zhuo
- Subjects
Electrical and Electronic Engineering ,Computer Graphics and Computer-Aided Design ,Software - Published
- 2022
30. A Multi-Level-Optimization Framework for FPGA-Based Cellular Neural Network Implementation
- Author
-
Cheng Zhuo, Yiyu Shi, Xiaowei Xu, Shaoheng Luo, and Zhongyang Liu
- Subjects
Flexibility (engineering) ,Signal processing ,Speedup ,Computer science ,Image processing ,02 engineering and technology ,020202 computer hardware & architecture ,Computer architecture ,Hardware and Architecture ,Cellular neural network ,0202 electrical engineering, electronic engineering, information engineering ,Redundancy (engineering) ,020201 artificial intelligence & image processing ,Electrical and Electronic Engineering ,Field-programmable gate array ,Implementation ,Software - Abstract
Cellular Neural Network (CeNN) is considered as a powerful paradigm for embedded devices. Its analog and mix-signal hardware implementations are proved to be applicable to high-speed image processing, video analysis, and medical signal processing with its efficiency and popularity limited by smaller implementation size and lower precision. Recently, digital implementations of CeNNs on FPGA have attracted researchers from both academia and industry due to its high flexibility and short time-to-market. However, most existing implementations are not well optimized to fully utilize the advantages of FPGA platform with unnecessary design and computational redundancy that prevents speedup. We propose a multi-level-optimization framework for energy-efficient CeNN implementations on FPGAs. In particular, the optimization framework is featured with three level optimizations: system-, module-, and design-space-level, with focus on computational redundancy and attainable performance, respectively. Experimental results show that with various configurations our framework can achieve an energy-efficiency improvement of 3.54× and up to 3.88× speedup compared with existing implementations with similar accuracy.
- Published
- 2018
31. A physics-aware methodology for equivalent circuit model extraction of TSV-inductors
- Author
-
Yiyu Shi, Cheng Zhuo, and Baixin Chen
- Subjects
Substrate coupling ,020208 electrical & electronic engineering ,Spice ,Three-dimensional integrated circuit ,020206 networking & telecommunications ,Hardware_PERFORMANCEANDRELIABILITY ,02 engineering and technology ,Inductor ,Hardware_GENERAL ,Hardware and Architecture ,Approximation error ,Hardware_INTEGRATEDCIRCUITS ,0202 electrical engineering, electronic engineering, information engineering ,Electronic engineering ,Scattering parameters ,Equivalent circuit ,Parasitic extraction ,Electrical and Electronic Engineering ,Software - Abstract
TSV-inductor has become a viable on-chip inductor option to ensure low-power, low-cost, and high-integration. Thus, it is imperative to accurately and efficiently model the electrical behavior of a TSV-inductor. Unlike the conventional 3D electro-magnetic wave model that suffers from its incapability of efficient time-domain SPICE simulation, in this paper, a systematic equivalent circuit model extraction methodology is presented to accurate the model of on-chip TSV-inductors in 3D IC. The circuit topology is based on a π -circuit with additional branches accounting for substrate coupling, signal crosstalk, skin and proximity effects. The parasitics are then extracted from the measured network parameters using an improved vector fitting method. Experimental results show that the proposed methodology is able to achieve a TSV-inductor equivalent circuit with very high accuracy with up to 10−4 deviation in S parameter comparison and 0.024% relative error in quality comparison.
- Published
- 2018
32. A saliency-based multiscale approach for infrared and visible image fusion
- Author
-
Jun Chen, Kangle Wu, Linbo Luo, and Cheng Zhuo
- Subjects
Computer science ,media_common.quotation_subject ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Gaussian blur ,02 engineering and technology ,Image (mathematics) ,Phase congruency ,symbols.namesake ,0202 electrical engineering, electronic engineering, information engineering ,Contrast (vision) ,Electrical and Electronic Engineering ,media_common ,Image fusion ,business.industry ,020206 networking & telecommunications ,Pattern recognition ,Filter (signal processing) ,Control and Systems Engineering ,Computer Science::Computer Vision and Pattern Recognition ,Signal Processing ,symbols ,020201 artificial intelligence & image processing ,Computer Vision and Pattern Recognition ,Artificial intelligence ,business ,Software - Abstract
The ideal fusion of the infrared image and visual image should integrate complete bright features of the infrared image, and preserve original visual information of the visual image as much as possible. To this end, we propose a multi-scale decomposition fusion method based on saliency. In particular, the saliency detection and a Gaussian smoothing filter are first employed to decompose source images into salient layers, detail layers and base layers. Then we adopt a nonlinear function to calculate the weight coefficient to fuse salient layers and highlight the target. Subsequently, we use a fusion rule based on phase congruency for fusion of detail layers so that the details could be retained better than the traditional “max-absolute” fusion rule. Experiments show that the proposed method can achieve better fusion effect than the state-of-the-art methods qualitatively and quantitatively. Moreover, for the ill-illumination fused image, in order to get better visual effect, we further propose a contrast enhancement algorithm based on total variation minimization. Experiments show that the proposed method can enhance the contrast and retain details of the source images well.
- Published
- 2021
33. A routing framework for technology migration with bump encroachment
- Author
-
Cheng Zhuo, Wai-Kei Mak, Ting-Chi Wang, Kassan Unda, Yiyu Shi, and Po-Yi Wu
- Subjects
Engineering ,021103 operations research ,business.industry ,Reliability (computer networking) ,Interface (computing) ,Bandwidth (signal processing) ,0211 other engineering and technologies ,02 engineering and technology ,Flow network ,Chip ,020202 computer hardware & architecture ,Hardware and Architecture ,ComputerSystemsOrganization_MISCELLANEOUS ,Embedded system ,0202 electrical engineering, electronic engineering, information engineering ,Electronic engineering ,Bumping ,Redistribution layer ,Electrical and Electronic Engineering ,Routing (electronic design automation) ,business ,Software - Abstract
Technology migration plays a critical role in the time-to-market competition. Most existing works focus on layout compaction or hardware description language re-synthesis, and pay little attention to the I/O interface in flip chips. The complication of bumping process as well as electrical and reliability considerations prevent the bumps from scaling with transistor sizes. On the other hand, the number of signal bumps cannot be reduced and sometimes even increases due to the demands for wider bandwidth and various peripheral devices. As a result, the allocated die area for I/O can no longer afford the number of bumps a chip requires. This issue, known as bump encroachment, puts a stringent requirement on the redistribution layer (RDL) routing. In this paper, we first formulate the problem of RDL routing with bump encroachment, and then propose a network flow based algorithm to efficiently address it. Experimental results on a few benchmarks with parameters extracted from industrial designs show that compared with a maze routing-based approach, our algorithm can achieve up to 72% wire length reduction.
- Published
- 2017
34. Power Delivery Resonant Virus: Concept and Applications
- Author
-
Cheng Zhuo, Tianhao Shen, Yiyu Shi, and Di Gao
- Subjects
Computer science ,business.industry ,020208 electrical & electronic engineering ,Workload ,02 engineering and technology ,Chip ,Capacitance ,020202 computer hardware & architecture ,Power (physics) ,Inductance ,Noise ,Software ,0202 electrical engineering, electronic engineering, information engineering ,Electronic engineering ,business ,Degradation (telecommunications) - Abstract
Various hardware attacks have recently emerged to fail chips in critical civil and military infrastructures. However, most of them jeopardize the circuit functionality through additional hardware, where several countermeasures have been developed. In this paper, we present a very interesting yet powerful virus that can cause chip failure. Instead of directly injecting hardware sub-circuits that require layout modification or split manufacturing, we use resonant noise in power delivery system as the weapon. We show that, with simple but particular manipulations at software layer, repetitive excitations can be created. As the period gets closer to the resonance of the power delivery system, caused by on-chip capacitance and package inductance, significant voltage overshoot and undershoot can occur, preventing the regular operations of phase-locked-loops and other sensitive components. In short, the virus can hide deep within the software programs, but is easy to activate and impose severe impacts. Experimental results show that the proposed resonant virus may result in noise up to 33-53% of the nominal supply level, which doubles the noise generated by PARSEC3 workload. Moreover, the virus brings 8-19% more performance degradation than the regular workload.
- Published
- 2019
35. When Single Event Upset Meets Deep Neural Networks: Observations, Explorations, and Remedies
- Author
-
Wang Liao, Masanori Hashimoto, Cheng Zhuo, Yiyu Shi, Zheyu Yan, and Xichuan Zhou
- Subjects
FOS: Computer and information sciences ,Data processing ,Computer Science - Machine Learning ,Floating point ,Computer Science - Cryptography and Security ,Artificial neural network ,Computer science ,business.industry ,media_common.quotation_subject ,Machine Learning (stat.ML) ,Reliability engineering ,Machine Learning (cs.LG) ,Software ,Robustness (computer science) ,Single event upset ,Statistics - Machine Learning ,Perception ,Static random-access memory ,business ,Cryptography and Security (cs.CR) ,media_common - Abstract
Deep Neural Network has proved its potential in various perception tasks and hence become an appealing option for interpretation and data processing in security sensitive systems. However, security-sensitive systems demand not only high perception performance, but also design robustness under various circumstances. Unlike prior works that study network robustness from software level, we investigate from hardware perspective about the impact of Single Event Upset (SEU) induced parameter perturbation (SIPP) on neural networks. We systematically define the fault models of SEU and then provide the definition of sensitivity to SIPP as the robustness measure for the network. We are then able to analytically explore the weakness of a network and summarize the key findings for the impact of SIPP on different types of bits in a floating point parameter, layer-wise robustness within the same network and impact of network depth. Based on those findings, we propose two remedy solutions to protect DNNs from SIPPs, which can mitigate accuracy degradation from 28% to 0.27% for ResNet with merely 0.24-bit SRAM area overhead per parameter., Comment: 7 pages, 8 figures
- Published
- 2019
- Full Text
- View/download PDF
36. Introduction to special issue of 2019 China Semiconductor Technology International Conference (CSTIC) Symposium on Design and Automation of Circuits and Systems
- Author
-
Wenjian Yu, Cheng Zhuo, and Weikang Qian
- Subjects
Engineering ,Hardware and Architecture ,business.industry ,Semiconductor technology ,Systems engineering ,Electrical and Electronic Engineering ,business ,China ,Automation ,Software ,Electronic circuit - Published
- 2020
37. A design framework for processing-in-memory accelerator
- Author
-
Di Gao, Cheng Zhuo, and Tianhao Shen
- Subjects
010302 applied physics ,Profiling (computer programming) ,Electronic system-level design and verification ,business.industry ,Computer science ,02 engineering and technology ,01 natural sciences ,Bottleneck ,Matrix multiplication ,020202 computer hardware & architecture ,Software ,Embedded system ,0103 physical sciences ,0202 electrical engineering, electronic engineering, information engineering ,Hardware acceleration ,Performance improvement ,business ,Efficient energy use - Abstract
With increasing performance mismatch between processor and memory, "memory wall" has become the bottleneck of the entire computing system. In order to bridge the gap, processing-in-memory (PIM) has been revisited as a viable option to overcome the challenge, with various researches from devices to system. In this paper we present a complete design framework for PIM based acceleration with energy efficiency and performance improvement. The framework covers system level design and prototype architecture and software stack support to enable hardware accelerator design and optimization. It is also featured with configurability, easy access and effective evaluating and profiling. In the experiments, we analyzed a convolutional neural network to identify the least energy-efficient operation and replaced that by PIM acceleration. The experimental results show that the proposed accelerator is able to achieve up 6-9X performance gain for matrix multiplication as well as 10-15X energy improvement compared to conventional CPU-only implementation.
- Published
- 2018
38. An accelerator-aware microarchitecture simulator for design space exploration
- Author
-
Di Gao and Cheng Zhuo
- Subjects
Focus (computing) ,Software ,Stack (abstract data type) ,business.industry ,Design space exploration ,Computer science ,Programming paradigm ,Hardware acceleration ,Limiting ,business ,Simulation ,Microarchitecture - Abstract
Specialized hardware accelerator has emerged as an efficient approach to mitigate the issue of power wall. Most accelerator designs focus on accelerator RTL synthesis, and place little focus on the communications between core and accelerator, thereby potentially limiting overall system performance. This paper presents an accelerator-aware micro-architectural simulator that supports accelerator design using high-level language (HLL) description, e.g., C++. The paper also discusses the design of a complete software stack for the simulator, from programming model, user configurability, to power profiler. Designers can then use the tool to conduct case studies and make performance analyses for design space exploration.
- Published
- 2018
39. A Cross-Layer Approach for Early-Stage Power Grid Design and Optimization
- Author
-
Cheng Zhuo, Houle Gan, Wei-Kai Shih, and Alaeddin Aydiner
- Subjects
Flexibility (engineering) ,Scheme (programming language) ,Engineering ,Power gating ,business.industry ,Power integrity ,Topology (electrical circuits) ,computer.file_format ,Power optimization ,Power (physics) ,Hardware and Architecture ,Electronic engineering ,Common Power Format ,Electrical and Electronic Engineering ,business ,computer ,Software ,computer.programming_language - Abstract
Power integrity has become increasingly important for sub-32nm designs. Many prior works have discussed power grid design and optimization in the post-layout stage, when design change is inevitably expensive and difficult. In contrast, during the early stage of a development cycle, designers have more flexibility to improve the design quality. However, there are several fundamental challenges at early stage when the design database is not complete, including extraction, modeling, and optimization. This article tackles these fundamental issues of early-stage power grid design from architecture to layout. The proposed methods have been silicon validated on 32nm on-market chips and successfully applied to a 22nm design for its early-stage power grid design. The findings from such practices reveal that, for sub-32nm chips, an intrinsic on-die capacitance and power gate scheme may have more significant impact than expected on power integrity, and needs to be well addressed at early stage.
- Published
- 2015
40. Silicon-Validated Power Delivery Modeling and Analysis on a 32-nm DDR I/O Interface
- Author
-
Alaeddin Aydiner, Cheng Zhuo, Gustavo Wilke, Ritochit Chakraborty, Wei-Kai Shih, and S. Chakravarty
- Subjects
Engineering ,business.industry ,Interface (computing) ,Power integrity ,Power (physics) ,Modeling and simulation ,Inductance ,Noise ,Hardware and Architecture ,Electronic engineering ,Parasitic extraction ,Electrical and Electronic Engineering ,business ,Software ,Jitter - Abstract
Power integrity has become increasingly important for the designs in 32 nm or below. This paper discusses a silicon-validated methodology for power delivery (PD) modeling and simulation. Many prior works have focused on PD analysis and optimization. However, none of them provided a comprehensive modeling methodology with postsilicon data to validate the use of the models. In this paper, we present PD system models that are able to achieve less than 10% deviation from the supply noise measurements on a 32-nm industrial double date-rate I/O design. Our models are able to capture the unique impacts of on-die inductance, state-dependent coupling capacitance, and die-package interaction. Those impacts are prominent for the designs in 32 nm or below but were considered negligible or even not noted in earlier technology nodes. Comparisons were made to quantify the impacts of different modeling strategies on supply noise prediction accuracy. This specifically provides designers insights in selecting appropriate models for PD analysis.
- Published
- 2015
41. On the Efficacy of Through-Silicon-Via Inductors
- Author
-
Cheng Zhuo, Umamaheswara Rao Tida, Yiyu Shi, and Rongbo Yang
- Subjects
Engineering ,Through-silicon via ,business.industry ,Electrical engineering ,Integrated circuit ,Inductor ,law.invention ,Inductance ,Footprint (electronics) ,Compressed sensing ,Hardware and Architecture ,law ,Q factor ,Electronic engineering ,Point (geometry) ,Electrical and Electronic Engineering ,business ,Software - Abstract
Through-silicon-vias (TSVs) can potentially be used to implement inductors in 3-D integrated systems for minimal footprint and large inductance. However, different from conventional 2-D spiral inductors, TSV inductors are fully buried in the lossy substrate, thus suffering from low quality factors. In this paper, we systematically examine how various process and design parameters affect their performance. A few interesting phenomena that are unique to TSV inductors are observed. We then propose a novel shield mechanism utilizing the microchannel, a technique conventionally used for heat removal, to reduce the substrate loss. The technique increases the quality factor and inductance of the TSV inductor by up to $21\times $ and $17\times $ , respectively. Finally, since full-wave simulations of 3-D structures are time-consuming, we develop a set of compressed sensing-based design strategies for microchannel-shielded TSV inductors, which only requires a minimal number of simulations. It enables us to implement microchannel-shielded TSV inductors of up to $5.44\times $ reduced area compared with spiral inductors of the same design specs (quality factor, inductance, and frequency). To the best of our knowledge, this is the very first in-depth study on TSV inductors to make them practical for high-frequency applications. We hope our study shall point out a new and exciting research direction for 3-D integrated circuit designers.
- Published
- 2015
42. Novel Through-Silicon-Via Inductor-Based On-Chip DC-DC Converter Designs in 3D ICs
- Author
-
Cheng Zhuo, Yiyu Shi, and Umamaheswara Rao Tida
- Subjects
Engineering ,Through-silicon via ,business.industry ,Buck converter ,Overhead (engineering) ,Ćuk converter ,Electrical engineering ,Converters ,Inductor ,Inductive coupling ,Hardware and Architecture ,Electronic engineering ,Electrical and Electronic Engineering ,business ,Software ,Voltage - Abstract
There has been a tremendous research effort in recent years to move DC-DC converters on chip for enhanced performance. However, a major limiting factor to implementing on-chip inductive DC-DC converters is the large area overhead induced by spiral inductors. Thus, we propose using through-silicon-vias (TSVs), a critical enabling technique in three-dimensional (3D) integrated systems, to implement on-chip inductors for DC-DC converters. While existing literature show that TSV inductors are inferior compared with conventional spiral inductors due to substrate loss for RF applications, in this article, we demonstrate that it is not the case for DC-DC converters, which operate at relatively low frequencies. Experimental results show that by replacing conventional spiral inductors with TSV inductors, with almost the same efficiency and output voltage, up to 4.3× and 3.2× inductor area reduction can be achieved for the single-phase buck converter and the interleaved buck converter with magnetic coupling, respectively.
- Published
- 2014
43. A Statistical Framework for Post-Fabrication Oxide Breakdown Reliability Prediction and Management
- Author
-
Cheng Zhuo, Dennis Sylvester, and David Blaauw
- Subjects
Very-large-scale integration ,Engineering ,business.industry ,Oxide ,Hardware_PERFORMANCEANDRELIABILITY ,Integrated circuit design ,Maximization ,Chip ,Computer Graphics and Computer-Aided Design ,Reliability engineering ,chemistry.chemical_compound ,chemistry ,Hardware_INTEGRATEDCIRCUITS ,Electronic engineering ,Electrical and Electronic Engineering ,Performance improvement ,business ,Software ,Reliability (statistics) ,Voltage - Abstract
Oxide breakdown has become an increasingly pressing reliability issue in modern very large scale integration design with ultrathin oxides. The conventional guard-band methodology assumes uniformly thin oxide thickness, resulting in overly pessimistic reliability estimation that severely degrades system performance. In this paper, we present the use of limited post-fabrication measurements of oxide thicknesses from on-chip sensors to aid in the chip-level oxide breakdown reliability management. A key challenge, which is the focus of this paper, is precisely predicting and managing the reliability condition of each chip with a limited number of measurements and quantifying the tradeoff between reliability margin and system performance. Given the post-fabrication measurements, chip oxide breakdown reliability can be formulated as a conditional distribution that allows one to achieve a significantly more accurate chip lifetime estimation. The estimation is then used to individually tune the supply voltage of each chip for performance maximization while maintaining or improving the reliability. Experimental results show that, by using 25 measurements, the proposed method can achieve an average of 19% performance improvement, and a 27% maximum for a design with up to 50 million devices, with an average operation time of approximately 0.4 s per chip.
- Published
- 2013
44. Process Variation and Temperature-Aware Full Chip Oxide Breakdown Reliability Analysis
- Author
-
Dennis Sylvester, David Blaauw, K. Chopra, and Cheng Zhuo
- Subjects
Engineering ,Spatial correlation ,business.industry ,Monte Carlo method ,Hardware_PERFORMANCEANDRELIABILITY ,Integrated circuit ,Chip ,Computer Graphics and Computer-Aided Design ,law.invention ,Process variation ,Reliability (semiconductor) ,law ,Gate oxide ,Logic gate ,Hardware_INTEGRATEDCIRCUITS ,Electronic engineering ,Electrical and Electronic Engineering ,business ,Software - Abstract
Gate oxide breakdown (OBD) is a key factor limiting the useful lifetime of an integrated circuit. Unfortunately, the conventional approach for full chip OBD reliability analysis assumes a uniform oxide thickness and worst-case temperature for all devices. In practice, however, gate oxide thickness varies from die-to-die and within-die and hence may cause different reliability for different devices even chips. Moreover, due to the increased across-die temperature variation, such difference may be exacerbated. Thus, as the precision of variation control worsens, an alternative reliability analysis approach is needed. In this paper, we propose a statistical framework for chip-level gate OBD reliability analysis while considering both die-to-die and within-die components of thickness variations as well as the across-die temperature variation. The thickness of each device is modeled as a distinct random variable and thus the full chip reliability estimation problem is defined on a huge sample space of several million devices. We observe that the chip-level OBD reliability function is independent of the relative location of the individual devices. This enables us to transform the problem such that the resulting representation can be expressed in terms of much fewer random variables. Using this transformation, we present a computationally efficient and accurate approach for estimating the full chip reliability while considering spatial correlations of gate oxide thickness as well as temperature variation. We show that, compared to Monte Carlo simulation, the proposed method incurs an error of only around 1% while improving the runtime by more than three orders of magnitude.
- Published
- 2011
45. Sensor-Driven Reliability and Wearout Management
- Author
-
Cheng Zhuo, David Blaauw, Dennis Sylvester, Eric Karl, and Prashant Singh
- Subjects
Engineering ,business.industry ,Transistor ,Hardware_PERFORMANCEANDRELIABILITY ,Chip ,Reliability engineering ,law.invention ,Process variation ,Reliability (semiconductor) ,Hardware and Architecture ,law ,MOSFET ,Hardware_INTEGRATEDCIRCUITS ,Electronic engineering ,Electrical and Electronic Engineering ,business ,Software ,Degradation (telecommunications) ,Electronic circuit ,Voltage - Abstract
In this article, we propose two new approaches to improve existing DRM (dynamic reliability management) methodology First, we propose reliability sensors that use small replicated circuits to directly measure device wearout on the chip. A direct degradation measurement by these sensors removes a layer of uncertainty introduced because of inaccurate calibration of the degradation models. Note that, despite using the degradation sensors, we still require the degradation models in order to make reliability projections for the chip's remaining lifetime. Aggressive oxide thickness scaling has caused large vertical electric fields in MOSFET devices, a situation that makes oxide breakdown a crucial issue when supply voltage is not scaled as aggressively as transistor feature size. It therefore becomes increasingly difficult to ensure the reliability of ICs over their lifetime.
- Published
- 2009
46. Power Grid Analysis and Optimization Using Algebraic Multigrid
- Author
-
Min Zhao, Kangsheng Chen, Cheng Zhuo, and Jiang Hu
- Subjects
Reduction (complexity) ,Speedup ,Multigrid method ,Computer science ,Process (computing) ,Electronic engineering ,Electrical and Electronic Engineering ,Solver ,Grid ,Computer Graphics and Computer-Aided Design ,Software ,Computational science ,Interpolation - Abstract
This paper presents a class of power grid analysis and optimization techniques, all of which are based on the algebraic-multigrid (AMG) method. First, a new AMG-based reduction scheme is proposed to improve the efficiency of reducing the problem size for power grid analysis and optimization. Next, with the proposed reduction technique, a fast transient-analysis method is developed and extended to an accurate solver with error control mechanism. After that, the scope of this method is further broadened for handling the analysis of the modified grid. Finally, a fast decap-allocation (DA) scheme based on AMG is suggested. Experimental results show that these techniques not only achieve a significant speedup over reported industrial methods but also enhance the quality of solutions. By using the proposed techniques, transient analysis with 200 time steps on a 1.6-M-node power grid can be completed in less than 5 min; dc analysis on the same circuit can reach an accuracy of in about 141 s. Our DA can process a circuit with up to one million nodes in about 11 min.
- Published
- 2008
47. Sensor Driven Reliability and Wearout Management
- Author
-
Cheng Zhuo, Eric Karl, Prashant Singh, David Blaauw, and Dennis Sylvester
- Subjects
Hardware and Architecture ,Electrical and Electronic Engineering ,Software - Published
- 2013
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.