Back to Search
Start Over
Hardware-Level Thread Migration to Reduce On-Chip Data Movement Via Reinforcement Learning
- Source :
- IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems. 39:3638-3649
- Publication Year :
- 2020
- Publisher :
- Institute of Electrical and Electronics Engineers (IEEE), 2020.
-
Abstract
- As the number of processing cores and associated threads in chip multiprocessors (CMPs) continues to scale out, on-chip memory access latency dominates application execution time due to increased data movement. Although tiled CMP architectures with distributed shared caches provide a scalable design, increased physical distance between requesting and responding cores has led to both increased on-chip memory access latency and excess energy consumption. Near data processing is a promising approach that can migrate threads closer to data, however prior hand-engineered rules for fine-grained hardware-level thread migration are either too slow to react to changes in data access patterns, or unable to exploit the large variety of data access patterns. In this article, we propose to use reinforcement learning (RL) to learn relatively complex data access patterns to improve on hardware-level thread migration techniques. By utilizing the recent history of memory access locations as input, each thread learns to recognize the relationship between prior access patterns and future memory access locations. This leads to the unique ability of the proposed technique to make fewer, more effective migrations to intermediate cores that minimize the distance to multiple distinct memory access locations. By allowing a low-overhead RL agent to learn a policy from real interaction with parallel programming benchmarks in a parallel simulator, we show that a migration policy which recognizes more complex data access patterns can be learned. The proposed approach reduces on-chip data movement and energy consumption by an average of 41%, while reducing execution time by 43% when compared to a simple baseline with no thread migration; furthermore, energy consumption and execution time are reduced by an additional 10% when compared to a hand-engineered fine-grained migration policy.
- Subjects :
- Computer science
Distributed computing
02 engineering and technology
Energy consumption
Thread (computing)
Computer Graphics and Computer-Aided Design
Execution time
020202 computer hardware & architecture
Instruction set
Data access
Scalability
0202 electrical engineering, electronic engineering, information engineering
Reinforcement learning
Electrical and Electronic Engineering
Latency (engineering)
Software
Subjects
Details
- ISSN :
- 19374151 and 02780070
- Volume :
- 39
- Database :
- OpenAIRE
- Journal :
- IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
- Accession number :
- edsair.doi...........2418e25b8ad7586a93ac84f11c8639aa
- Full Text :
- https://doi.org/10.1109/tcad.2020.3012650