Back to Search Start Over

A two level neural approach combining off-chip prediction with adaptive prefetch filtering

Authors :
Universitat Politècnica de Catalunya. Doctorat en Arquitectura de Computadors
Bracelona Supercomputing Center
Jamet, Alexandre Valentin
Vavouliotis, Georgios
Jiménez, Daniel A.
Álvarez Martí, Lluc
Casas, Marc
Universitat Politècnica de Catalunya. Doctorat en Arquitectura de Computadors
Bracelona Supercomputing Center
Jamet, Alexandre Valentin
Vavouliotis, Georgios
Jiménez, Daniel A.
Álvarez Martí, Lluc
Casas, Marc
Publication Year :
2024

Abstract

To alleviate the performance and energy overheads of contemporary applications with large data footprints, we propose the Two Level Perceptron (TLP) predictor, a neural mechanism that effectively combines predicting whether an access will be off-chip with adaptive prefetch filtering at the first-level data cache (L1D). TLP is composed of two connected microarchitectural perceptron predictors, named First Level Predictor (FLP) and Second Level Predictor (SLP). FLP performs accurate off-chip prediction by using several program features based on virtual addresses and a novel selective delay component. The novelty of SLP relies on leveraging off-chip prediction to drive L1D prefetch filtering by using physical addresses and the FLP prediction as features. TLP constitutes the first hardware proposal targeting both off-chip prediction and prefetch filtering using a multilevel perceptron hardware approach. TLP only requires 7KB of storage. To demonstrate the benefits of TLP we compare its performance with state-of-the-art approaches using off-chip prediction and prefetch filtering on a wide range of single-core and multi-core workloads. Our experiments show that TLP reduces the average DRAM transactions by 30.7% and 17.7%, as compared to a baseline using state-of-the-art cache prefetchers but no off-chip prediction mechanism, across the single-core and multi-core workloads, respectively, while recent work significantly increases DRAM transactions. As a result, TLP achieves geometric mean performance speedups of 6.2% and 11.8% across single-core and multi-core workloads, respectively. In addition, our evaluation demonstrates that TLP is effective independently of the L1D prefetching logic.<br />This work has been partially supported by the European HiPEAC Network of Excellence, by the Spanish Ministry of Science and Innovation MCIN/AEI/10.13039/501100011033 (contracts PID2019-107255GB-C21 and PID2019-105660RBC22) and by the Generalitat de Catalunya (contract 2021-SGR00763). This work is supported by the National Science Foundation through grant CCF-1912617 and generous gifts from Intel. Marc Casas has been partially supported by the Grant RYC2017-23269 funded by MCIN/AEI/10.13039/501100011033 and by ESF Investing in your future. Els autors agraeixen el suport del Departament de Recerca i Universitats de la Generalitat de Catalunya al Grup de Recerca ”Performance understanding, analysis, and simulation/emulation of novel architectures” (Codi: 2021 SGR 00865).<br />Peer Reviewed<br />Postprint (author's final draft)

Details

Database :
OAIster
Notes :
15 p., application/pdf, English
Publication Type :
Electronic Resource
Accession number :
edsoai.on1452494875
Document Type :
Electronic Resource