Back to Search Start Over

A Logic-Compatible eDRAM Compute-In-Memory With Embedded ADCs for Processing Neural Networks.

Authors :
Yu, Chengshuo
Yoo, Taegeun
Kim, Hyunjoon
Kim, Tony Tae-Hyoung
Chuan, Kevin Chai Tshun
Kim, Bongjin
Source :
IEEE Transactions on Circuits & Systems. Part I: Regular Papers. Feb2021, Vol. 68 Issue 2, p667-679. 13p.
Publication Year :
2021

Abstract

A novel 4T2C ternary embedded DRAM (eDRAM) cell is proposed for computing a vector-matrix multiplication in the memory array. The proposed eDRAM-based compute-in-memory (CIM) architecture addresses a well-known Von Neumann bottle-neck in the traditional computer architecture and improves both latency and energy in processing neural networks. The proposed ternary eDRAM cell takes a smaller area than prior SRAM-based bitcells using 6–12 transistors. Nevertheless, the compact eDRAM cell stores a ternary state (−1, 0, or +1), while the SRAM bitcells can only store a binary state. We also present a method to mitigate the compute accuracy degradation issue due to device mismatches and variations. Besides, we extend the eDRAM cell retention time to $200~\mu \text{s}$ by adding a custom metal capacitor at the storage node. With the improved retention time, the overall energy consumption of eDRAM macro, including a regular refresh operation, is lower than most of prior SRAM-based CIM macros. A $128\times 128$ ternary eDRAM macro computes a vector-matrix multiplication between a vector with 64 binary inputs and a matrix with $64\times 128$ ternary weights. Hence, 128 outputs are generated in parallel. Note that both weight and input bit-precisions are programmable for supporting a wide range of edge computing applications with different performance requirements. The bit-precisions are readily tunable by assigning a variable number of eDRAM cells per weight or adding multiple pulses to input. An embedded column ADC based on replica cells sweeps the reference level for $2^{\mathrm {N}}-1$ cycles and converts the analog accumulated bitline voltage to a 1-5bit digital output. A critical bitline accumulate operation is simulated (Monte-Carlo, 3K runs). It shows the standard deviation of 2.84% that could degrade the classification accuracy of the MNIST dataset by 0.6% and the CIFAR-10 dataset by 1.3% versus a baseline with no variation. The simulated energy is 1.81fJ/operation, and the energy efficiency is 552.5-17.8TOPS/W (for 1-5bit ADC) at 200MHz using 65nm technology. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
15498328
Volume :
68
Issue :
2
Database :
Academic Search Index
Journal :
IEEE Transactions on Circuits & Systems. Part I: Regular Papers
Publication Type :
Periodical
Accession number :
148207848
Full Text :
https://doi.org/10.1109/TCSI.2020.3036209