Back to Search
Start Over
A Scalable Multi-TeraOPS Core for AI Training and Inference
- Source :
- IEEE Solid-State Circuits Letters. 1:217-220
- Publication Year :
- 2018
- Publisher :
- Institute of Electrical and Electronics Engineers (IEEE), 2018.
-
Abstract
- This letter presents a multi-TOPS AI accelerator core for deep learning training and inference. With a programmable architecture and custom ISA, this engine achieves >90% sustained utilization across the range of neural network topologies by employing a dataflow architecture to provide high throughput and an on-chip scratchpad hierarchy to meet the bandwidth demands of the compute units. A custom 16b floating point (fp16) representation with 1 sign bit, 6 exponent bits, and 9 mantissa bits has also been developed for high model accuracy in training and inference as well as 1b/2b (binary/ternary) integer for aggressive inference performance. At 1.5 GHz, the AI core prototype achieves 1.5 TFLOPS fp16, 12 TOPS ternary, or 24 TOPS binary peak performance in 14-nm CMOS.
Details
- ISSN :
- 25739603
- Volume :
- 1
- Database :
- OpenAIRE
- Journal :
- IEEE Solid-State Circuits Letters
- Accession number :
- edsair.doi...........9fce01a03750f2d886bc8908899c58be