1. Fully On-Chip MAC at 14 nm Enabled by Accurate Row-Wise Programming of PCM-Based Weights and Parallel Vector-Transport in Duration-Format
- Author
-
I. Ok, Samuel S. Choi, Riduan Khaddam-Aljameh, Scott C. Lewis, Charles Mackin, Wilfried Haensch, Kevin W. Brew, Victor Chan, F. Lie, Alexander Friz, Stefano Ambrogio, Marc A. Bergendahl, James J. Demarest, Geoffrey W. Burr, Akiyo Nomura, Atsuya Okazaki, Katie Spoon, Takeo Yasuda, Masatoshi Ishii, Nicole Saulnier, Ishtiaq Ahsan, Pritish Narayanan, Hsinyu Tsai, Vijay Narayanan, Hiroyuki Mori, Y. Kohda, and Kohji Hosokawa
- Subjects
Artificial neural network ,business.industry ,Computer science ,Deep learning ,Parallel computing ,Chip ,Network topology ,Electronic, Optical and Magnetic Materials ,Phase-change memory ,Hardware acceleration ,Artificial intelligence ,Electrical and Electronic Engineering ,business ,Massively parallel ,MNIST database - Abstract
Hardware acceleration of deep learning using analog non-volatile memory (NVM) requires large arrays with high device yield, high accuracy Multiply-ACcumulate (MAC) operations, and routing frameworks for implementing arbitrary deep neural network (DNN) topologies. In this article, we present a 14-nm test-chip for Analog AI inference--it contains multiple arrays of phase change memory (PCM)-devices, each array capable of storing 512 x 512 unique DNN weights and executing massively parallel MAC operations at the location of the data. DNN excitations are transported across the chip using a duration representation on a parallel and reconfigurable 2-D mesh. To accurately transfer inference models to the chip, we describe a closed-loop tuning (CLT) algorithm that programs the four PCM conductances in each weight, achieving
- Published
- 2021
- Full Text
- View/download PDF