Start Over

Hardware-Level Thread Migration to Reduce On-Chip Data Movement Via Reinforcement Learning

Authors :: Razvan Bunescu
Ahmed Louri
Kyle Shiflett
Quintin Fettes
Avinash Karanth
Source :: IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems. 39:3638-3649
Publication Year :: 2020
Publisher :: Institute of Electrical and Electronics Engineers (IEEE), 2020.
Abstract: As the number of processing cores and associated threads in chip multiprocessors (CMPs) continues to scale out, on-chip memory access latency dominates application execution time due to increased data movement. Although tiled CMP architectures with distributed shared caches provide a scalable design, increased physical distance between requesting and responding cores has led to both increased on-chip memory access latency and excess energy consumption. Near data processing is a promising approach that can migrate threads closer to data, however prior hand-engineered rules for fine-grained hardware-level thread migration are either too slow to react to changes in data access patterns, or unable to exploit the large variety of data access patterns. In this article, we propose to use reinforcement learning (RL) to learn relatively complex data access patterns to improve on hardware-level thread migration techniques. By utilizing the recent history of memory access locations as input, each thread learns to recognize the relationship between prior access patterns and future memory access locations. This leads to the unique ability of the proposed technique to make fewer, more effective migrations to intermediate cores that minimize the distance to multiple distinct memory access locations. By allowing a low-overhead RL agent to learn a policy from real interaction with parallel programming benchmarks in a parallel simulator, we show that a migration policy which recognizes more complex data access patterns can be learned. The proposed approach reduces on-chip data movement and energy consumption by an average of 41%, while reducing execution time by 43% when compared to a simple baseline with no thread migration; furthermore, energy consumption and execution time are reduced by an additional 10% when compared to a hand-engineered fine-grained migration policy.

Subjects :: Computer science
Distributed computing
02 engineering and technology
Energy consumption
Thread (computing)
Computer Graphics and Computer-Aided Design
Execution time
020202 computer hardware & architecture
Instruction set
Data access
Scalability
0202 electrical engineering, electronic engineering, information engineering
Reinforcement learning
Electrical and Electronic Engineering
Latency (engineering)
Software

Details

ISSN :: 19374151 and 02780070
Volume :: 39
Database :: OpenAIRE
Journal :: IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Accession number :: edsair.doi...........2418e25b8ad7586a93ac84f11c8639aa
Full Text :: https://doi.org/10.1109/tcad.2020.3012650

Full Text Access

View/download PDF

Tools

Email
Cite

Printer

Authors Abstract Subjects Details

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Hardware-Level Thread Migration to Reduce On-Chip Data Movement Via Reinforcement Learning

Abstract

Subjects

Details

Tools

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Hardware-Level Thread Migration to Reduce On-Chip Data Movement Via Reinforcement Learning

Abstract

Subjects

Details

Tools

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources