Start Over

A Logic-Compatible eDRAM Compute-In-Memory With Embedded ADCs for Processing Neural Networks.

Authors :: Yu, Chengshuo
Yoo, Taegeun
Kim, Hyunjoon
Kim, Tony Tae-Hyoung
Chuan, Kevin Chai Tshun
Kim, Bongjin
Source :: IEEE Transactions on Circuits & Systems. Part I: Regular Papers. Feb2021, Vol. 68 Issue 2, p667-679. 13p.
Publication Year :: 2021
Abstract: A novel 4T2C ternary embedded DRAM (eDRAM) cell is proposed for computing a vector-matrix multiplication in the memory array. The proposed eDRAM-based compute-in-memory (CIM) architecture addresses a well-known Von Neumann bottle-neck in the traditional computer architecture and improves both latency and energy in processing neural networks. The proposed ternary eDRAM cell takes a smaller area than prior SRAM-based bitcells using 6–12 transistors. Nevertheless, the compact eDRAM cell stores a ternary state (−1, 0, or +1), while the SRAM bitcells can only store a binary state. We also present a method to mitigate the compute accuracy degradation issue due to device mismatches and variations. Besides, we extend the eDRAM cell retention time to $200~\mu \text{s}$ by adding a custom metal capacitor at the storage node. With the improved retention time, the overall energy consumption of eDRAM macro, including a regular refresh operation, is lower than most of prior SRAM-based CIM macros. A $128\times 128$ ternary eDRAM macro computes a vector-matrix multiplication between a vector with 64 binary inputs and a matrix with $64\times 128$ ternary weights. Hence, 128 outputs are generated in parallel. Note that both weight and input bit-precisions are programmable for supporting a wide range of edge computing applications with different performance requirements. The bit-precisions are readily tunable by assigning a variable number of eDRAM cells per weight or adding multiple pulses to input. An embedded column ADC based on replica cells sweeps the reference level for $2^{\mathrm {N}}-1$ cycles and converts the analog accumulated bitline voltage to a 1-5bit digital output. A critical bitline accumulate operation is simulated (Monte-Carlo, 3K runs). It shows the standard deviation of 2.84% that could degrade the classification accuracy of the MNIST dataset by 0.6% and the CIFAR-10 dataset by 1.3% versus a baseline with no variation. The simulated energy is 1.81fJ/operation, and the energy efficiency is 552.5-17.8TOPS/W (for 1-5bit ADC) at 200MHz using 65nm technology. [ABSTRACT FROM AUTHOR]

Subjects :: *DYNAMIC random access memory
*COMPUTER architecture
*VERNACULAR architecture
*RF values (Chromatography)
*RANDOM access memory
*EDGE computing

Details

Language :: English
ISSN :: 15498328
Volume :: 68
Issue :: 2
Database :: Academic Search Index
Journal :: IEEE Transactions on Circuits & Systems. Part I: Regular Papers
Publication Type :: Periodical
Accession number :: 148207848
Full Text :: https://doi.org/10.1109/TCSI.2020.3036209

Full Text Access

View/download PDF

Tools

Email
Cite

Printer

Authors Abstract Subjects Details

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

A Logic-Compatible eDRAM Compute-In-Memory With Embedded ADCs for Processing Neural Networks.

Abstract

Subjects

Details

Tools

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

A Logic-Compatible eDRAM Compute-In-Memory With Embedded ADCs for Processing Neural Networks.

Abstract

Subjects

Details

Tools

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources