Back to Search
Start Over
A Logic-Compatible eDRAM Compute-In-Memory With Embedded ADCs for Processing Neural Networks.
- Source :
-
IEEE Transactions on Circuits & Systems. Part I: Regular Papers . Feb2021, Vol. 68 Issue 2, p667-679. 13p. - Publication Year :
- 2021
-
Abstract
- A novel 4T2C ternary embedded DRAM (eDRAM) cell is proposed for computing a vector-matrix multiplication in the memory array. The proposed eDRAM-based compute-in-memory (CIM) architecture addresses a well-known Von Neumann bottle-neck in the traditional computer architecture and improves both latency and energy in processing neural networks. The proposed ternary eDRAM cell takes a smaller area than prior SRAM-based bitcells using 6–12 transistors. Nevertheless, the compact eDRAM cell stores a ternary state (−1, 0, or +1), while the SRAM bitcells can only store a binary state. We also present a method to mitigate the compute accuracy degradation issue due to device mismatches and variations. Besides, we extend the eDRAM cell retention time to $200~\mu \text{s}$ by adding a custom metal capacitor at the storage node. With the improved retention time, the overall energy consumption of eDRAM macro, including a regular refresh operation, is lower than most of prior SRAM-based CIM macros. A $128\times 128$ ternary eDRAM macro computes a vector-matrix multiplication between a vector with 64 binary inputs and a matrix with $64\times 128$ ternary weights. Hence, 128 outputs are generated in parallel. Note that both weight and input bit-precisions are programmable for supporting a wide range of edge computing applications with different performance requirements. The bit-precisions are readily tunable by assigning a variable number of eDRAM cells per weight or adding multiple pulses to input. An embedded column ADC based on replica cells sweeps the reference level for $2^{\mathrm {N}}-1$ cycles and converts the analog accumulated bitline voltage to a 1-5bit digital output. A critical bitline accumulate operation is simulated (Monte-Carlo, 3K runs). It shows the standard deviation of 2.84% that could degrade the classification accuracy of the MNIST dataset by 0.6% and the CIFAR-10 dataset by 1.3% versus a baseline with no variation. The simulated energy is 1.81fJ/operation, and the energy efficiency is 552.5-17.8TOPS/W (for 1-5bit ADC) at 200MHz using 65nm technology. [ABSTRACT FROM AUTHOR]
Details
- Language :
- English
- ISSN :
- 15498328
- Volume :
- 68
- Issue :
- 2
- Database :
- Academic Search Index
- Journal :
- IEEE Transactions on Circuits & Systems. Part I: Regular Papers
- Publication Type :
- Periodical
- Accession number :
- 148207848
- Full Text :
- https://doi.org/10.1109/TCSI.2020.3036209