1. Post-Silicon Heat-Source Identification and Machine-Learning-Based Thermal Modeling Using Infrared Thermal Imaging
- Author
-
Sheldon X.-D. Tan, Jinwei Zhang, Hengyang Zhao, Jorg Henkel, Sheriff Sadiqbatcha, and Hussam Amrouch
- Subjects
Artificial neural network ,Infrared ,Computer science ,02 engineering and technology ,Solid modeling ,Computer Graphics and Computer-Aided Design ,020202 computer hardware & architecture ,Computational science ,Transformation (function) ,Thermal ,0202 electrical engineering, electronic engineering, information engineering ,Overhead (computing) ,Electrical and Electronic Engineering ,Cluster analysis ,Software - Abstract
In this article, we present a novel post-silicon approach to locating the dominant heat sources on commercial multicore processors using heatmaps measured via an infrared (IR) thermal imaging setup. To locate the heat sources, 2-D spatial Laplacian transformation is performed on the heatmaps followed by $K$ -means clustering to find the dominant power/heat-source clusters. This is an exclusively post-silicon approach that does not require any knowledge of the underlying design of the commercial chips other than the information that is publicly available. Since the identified clusters are the thermally vulnerable areas on the die, we then propose a machine-learning-based framework to deriving a thermal model capable of estimating their temperatures during online use. Our approach involves collecting transient temperature data of the aforementioned heat sources and synchronized high-level performance metrics from the chip, and training a long-short-term-memory (LSTM) neural network (NN) that uses the performance metrics as inputs to estimate the temperatures of the identified heat sources in real time. Since the model is meant for real-time use, we explore methods of reducing the performance overhead and inference time of the model. This includes a novel power correlation-based approach to identifying the thermally irrelevant performance metrics and eliminating them in order to reduce the input dimensionality of the model, and an analysis on network sizing to determine the ideal NN configuration for the problem at hand. The model is trained and tested exclusively using measured thermal data from commercial multicore processors. The experimental results from two Intel multicore processors (i5-3337U and i7-8650U) show that the proposed approach achieves very high accuracy (root-mean-square error: 0.55 °C–0.93 °C) in estimating the temperatures of all the identified heat sources on the chip.
- Published
- 2021
- Full Text
- View/download PDF