Start Over

An In-memory-Computing DNN Achieving 700 TOPS/W and 6 TOPS/mm2 in 130-nm CMOS

Authors :: Zhang, Jintao
Verma, Naveen
Source :: IEEE Journal of Emerging and Selected Topics in Circuits and Systems; 2019, Vol. 9 Issue: 2 p358-366, 9p
Publication Year :: 2019
Abstract: Deep neural networks (DNNs) are increasingly popular in machine learning and have achieved the state-of-the-art performance in a range of tasks. Typically, the best results are achieved using a large amount of training data and large models, which make both training and inference complex. While GPUs are used in many applications for the parallel computing they provide, lower energy platforms have the potential to enable a range of new applications. A trend being observed is the ability to reduce the precision of weights and activations, with previous research showing that in some cases, weights and activations can be binarized [i.e., binarized neural networks (BNNs)], significantly reducing the model size. Exploiting this toward reduced compute energy and reduced data-movement energy, we demonstrate the BNN mapped to a previously presented in-memory-computing architecture, where binarized weights are stored in a standard 6T SRAM bit cell and computations are performed via an analog operation. Using a reduced size BNN, chosen to fit on the CMOS prototype (in 130 nm), MNIST classification is achieved with only 0.4% accuracy degradation (from 94%), but at <inline-formula> <tex-math notation="LaTeX">$26\times $ </tex-math></inline-formula> lower energy compared to a digital approach implementing the same network. The system reaches over 700-TOPS/W energy efficiency and 6-TOPS/mm<superscript>2</superscript> throughput.

Details

Language :: English
ISSN :: 21563357
Volume :: 9
Issue :: 2
Database :: Supplemental Index
Journal :: IEEE Journal of Emerging and Selected Topics in Circuits and Systems
Publication Type :: Periodical
Accession number :: ejs50336413
Full Text :: https://doi.org/10.1109/JETCAS.2019.2912352