Back to Search Start Over

A 7-nm Four-Core Mixed-Precision AI Chip With 26.2-TFLOPS Hybrid-FP8 Training, 104.9-TOPS INT4 Inference, and Workload-Aware Throttling

Authors :
Jinwook Oh
Alyssa Herbert
Marcel Schaal
Zhibin Ren
Ching Zhou
Siyu Koswatta
Naigang Wang
Matthew Cohen
Vidhi Zalani
Howard M. Haynie
Matthew M. Ziegler
Sae Kyu Lee
Brian W. Curran
Monodeep Kar
Martin Lutz
Xin Zhang
Robert Casatuta
Vijayalakshmi Srinivasan
Nianzheng Cao
Sunil Shukla
Pong-Fei Lu
Leland Chang
Michael A. Guillorn
Bruce M. Fleischer
Michael R. Scheuermann
Joel Abraham Silberman
Kerstin Schelm
Vinay Velji Shah
Chia-Yu Chen
Kailash Gopalakrishnan
Swagath Venkataramani
Hung Tran
Mingu Kang
Wei Wang
Jungwook Choi
Scot H. Rider
Jinwook Jung
James J. Bonanno
Radhika Jain
Li Yulong
Xiao Sun
Silvia Melitta Mueller
Kyu-hyoun Kim
Ankur Agrawal
Source :
IEEE Journal of Solid-State Circuits. 57:182-197
Publication Year :
2022
Publisher :
Institute of Electrical and Electronics Engineers (IEEE), 2022.

Abstract

Reduced precision computation is a key enabling factor for energy-efficient acceleration of deep learning (DL) applications. This article presents a 7-nm four-core mixed-precision artificial intelligence (AI) chip that supports four compute precisions--FP16, Hybrid-FP8 (HFP8), INT4, and INT2--to support diverse application demands for training and inference. The chip leverages cutting-edge algorithmic advances to demonstrate leading-edge power efficiency for 8-bit floating-point (FP8) training and INT4 inference without model accuracy degradation. A new HFP8 format combined with separation of the floating- and fixed-point pipelines and aggressive circuit/architecture optimization enables performance improvements while maintaining high compute utilization. A high-bandwidth ring protocol enables efficient data communication, while power management using workload-aware clock throttling maximizes performance within a given power budget. The AI chip demonstrates 3.58-TFLOPS/W peak energy efficiency and 26.2-TFLOPS peak performance for HFP8 iso-accuracy training, and 16.9-TOPS/W peak energy efficiency and 104.9-TOPS peak performance for INT4 iso-accuracy inference.

Details

ISSN :
1558173X and 00189200
Volume :
57
Database :
OpenAIRE
Journal :
IEEE Journal of Solid-State Circuits
Accession number :
edsair.doi...........2998556a11fb9c2dbdc4775ef363b4bb
Full Text :
https://doi.org/10.1109/jssc.2021.3120113