Back to Search Start Over

Reinforcement learning-based register renaming policy for simultaneous multithreading CPUs.

Authors :
Zhan, Huixin
Sheng, Victor S.
Lin, Wei-Ming
Source :
Expert Systems with Applications. Dec2021, Vol. 186, pN.PAG-N.PAG. 1p.
Publication Year :
2021

Abstract

Simultaneous multithreading (SMT) improves the performance of superscalar CPUs by exploiting thread-level parallelism with shared entries for better utilization of resources. A key issue for this out-of-order execution is that the occupancy latency of a physical rename register can be undesirably long due to many program execution-dependent factors that result in performance degradation. Such an issue becomes even more problematic in an SMT environment in which these registers are shared among concurrently running threads. Smartly managing this critical shared resource to ensure that slower threads do not block faster threads' execution is essential to the advancement of SMT performance. In this paper, an actor–critic style reinforcement learning (RL) algorithm is proposed to dynamically assigning an upper-bound (cap) of the rename registers any thread is allowed to use according to the threads' real-time demand. In particular, a critic network projects the current Issue Queues (IQ) usage, register file usage, and the cap value to a reward; an actor network is trained to project the current IQ usage and register file usage to the optimal real-time cap value via ascending the instructions per cycle (IPC) gradient within the trajectory distribution. The proposed method differs from the state-of-the-art (Wang and Lin, 2018) as the cap for the rename registers for each thread is adjusted in real-time according to the policy and state transition from self-play. The proposed method shows an improvement in IPC up to 162.8% in a 4-threaded system, 154.8% in a 6-threaded system and up to 101.7% in an 8-threaded system. The code is now available open source at https://github.com/98k-bot/RL-based-SMT-Register-Renaming-Policy. • The first reinforcement learning approach for effective resource utilization in SMT. • Continuous action space for fine-grained and accurate action space exploration. • Encouraging the convergence of the model via forming a lower bound of the policy. • A self-play mechanism compensates for the long time-scale variance of the data. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
09574174
Volume :
186
Database :
Academic Search Index
Journal :
Expert Systems with Applications
Publication Type :
Academic Journal
Accession number :
153071838
Full Text :
https://doi.org/10.1016/j.eswa.2021.115717