Author: "Torrents Lapuerta, Martí" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Torrents Lapuerta, Martí"' showing total 7 results

Start Over Author "Torrents Lapuerta, Martí"

7 results on '"Torrents Lapuerta, Martí"'

1. Exploiting vector code semantics for efficient data cache prefetching

Author: Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Barcelona Supercomputing Center, Martínez Palau, Francesc, Torrents Lapuerta, Martí, Armejach Sanosa, Adrià, Casas, Marc, Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Barcelona Supercomputing Center, Martínez Palau, Francesc, Torrents Lapuerta, Martí, Armejach Sanosa, Adrià, and Casas, Marc
Abstract: Emerging workloads from domains like high performance computing, data analytics or deep learning consume large amounts of memory bandwidth. To mitigate this problem, computing systems include large and deep memory cache hierarchies that exploit both spatial and temporal locality. In this context, hardware data cache prefetching constitutes a useful method to anticipate cache misses and boost performance. Despite their success in terms of high coverage rates, current data cache prefetchers incur a significant number of late and sometimes useless prefetches. Additionally, these state-of-the-art prefetchers are not aware of architecture trends towards larger vector units and vector-length agnostic instruction sets. This paper demonstrates that these trends bring new prefetching opportunities that make it possible to increase the accuracy and timeliness of any state-of-the-art prefetcher with a negligible area cost. We propose the the Register Vector Length Agnostic (ReVeLA) prefetcher. ReVeLA exploits program semantics present in vectorized codes. The ReVeLA prefetcher complements existing data cache prefetchers by providing highly accurate prefetch requests that improve prefetching timeliness and accuracy without significantly increasing memory bandwidth consumption. When applied on top of a state-of-the-art out-of-order vector processor, ReVeLA delivers a speed-up of 1.23 × with respect to a system without any prefetching approach. When combined with the NextLine, BOP, SPP, and PPF prefetchers, ReVeLA improves performance by 6.57%, 4.46%, 11.83%, and 11.40% respectively, with respect to a vector processor equipped with these prefetching approaches. Additionally, our evaluation demonstrates that ReVeLA increases memory bandwidth consumption by only 3.74% when combined with the most performing data cache prefetcher of our experimental campaign., This work has been partially supported by the Spanish Ministry of Science and Innovation MCIN/AEI/10.13039/501100011033 (contract PID2019-107255GB-C21) and ESF Investing in your future, the Generalitat of Catalunya (contract 2021-SGR-00763), the European HiPEAC Network of Excellence, and the European Processor Initiative (EPI), which is part of the European Union’s Horizon 2020 research and innovation program under grant agreement No. 826647. A. Armejach is a Serra Hunter Fellow. The authors thank the Departament de Recerca i Universitats de la Generalitat de Catalunya for supporting the Research Group "Performance understanding, analysis, and simulation/emulation of novel architectures" (Code: 2021 SGR 00865)., Peer Reviewed, Postprint (author's final draft)
Published: 2024

2. Improving prefetching mechanisms for tiled CMP platforms

Author: Torrents Lapuerta, Martí, Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Martínez Morais, Raul, and Molina Clemente, Carlos
Subjects: Hardware_MEMORYSTRUCTURES, Informàtica [Àrees temàtiques de la UPC], Memòria jeràrquica (Informàtica), Multiprocessadors
Abstract: Recently, high performance processor designs have evolved toward Chip-Multiprocessor (CMP) architectures to deal with instruction level parallelism limitations and, more important, to manage the power consumption that is becoming unaffordable due to the increased transistor count and clock frequency. At the present moment, this architecture, which implements multiple processing cores on a single die, is commercially available with up to twenty four processors on a single chip and there are roadmaps and research trends that suggest that number of cores will increase in the near future. The increasing on number of cores has converted the interconnection network in a key issue that will have significant impact on performance. Moreover, as the number of cores increases, tiled architectures are foreseen to provide a scalable solution to handle design complexity. Network-on-Chip (NoC) emerges as a solution to deal with growing on-chip wire delays. On the other hand, CMP designs are likely to be equipped with latency hiding techniques like prefetching in order to reduce the negative impact on performance that, otherwise, high cache miss rates would lead to. Unfortunately, the extra number of network messages that prefetching entails can drastically increase power consumption and the latency in the NoC. In this thesis, we do not develop a new prefetching technique for CMPs but propose improvements applicable to any of them. Specifically, we analyze the behavior of the prefetching in the CMPs and its impact to the interconnect. We propose several dynamic management techniques to improve the performance of the prefetching mechanism in the system. Furthermore, we identify the main problems when implementing prefetching in distributed memory systems like tiled architectures and propose directions to solve them. Finally, we propose several research lines to continue the work done in this thesis., Recentment l'arquitectura dels processadors d'altes prestacions ha evolucionat cap a processadors amb diversos nuclis per a concordar amb les limitacions del paral·lelisme a nivell d'instrucció i, mes important encara, per tractar el consum d'energia que ha esdevingut insostenible degut a l'increment de transistors i la freqüència de rellotge. Ara mateix, aquestes arquitectures, que implementes varis nuclis en un sol xip, estan a la venta amb mes de vint-i-quatre processadors en un sol xip i hi ha previsions que suggereixen que aquest nombre de nuclis creixerà en un futur pròxim. Aquest increment del nombre de nuclis, ha convertit la xarxa que els connecta en un punt clau que tindrà un impacte important en el seu rendiment. Una topologia de xarxa que sembla que serà capaç de proveir una solució escalable per aquestes arquitectures ha estat la topologia tile. Les xarxes en el xip (NoC) es presenten com la solució del increment de la latència dels cables del xip. Per altre banda, els dissenys de multiprocessadors seguiran disposant de tècniques de reducció de latència de memòria com el prefetch per tal de reduir l'impacte negatiu en rendiment que, altrament, tindríem degut als elevats temps de latència en fallades a memòria cache. Desafortunadament, el gran nombre de peticions destinades a prefetch, pot augmentar dràsticament la congestió a la xarxa i el consum d'energia. En aquesta tesi, no desenvolupem cap tècnica nova de prefetching, però proposem millores aplicables a qualsevol d'ells. Concretament analitzem el comportament del prefetching en multiprocessadors i el seu impacte a la xarxa. Proposem diverses tècniques de control dinàmic per millor el rendiment del prefetcher al sistema. A més, identifiquem els problemes principals d'implementar el prefetching en els sistemes de memòria distribuïts com els de les arquitectures tile i proposem línies d'investigació per solucionar-los. Finalment, també proposem diverses línies d'investigació per continuar amb el treball fet en aquesta tesi.
Published: 2016

3. Improving the prefetching performance through code region profiling

Author: Torrents Lapuerta, Martí, Martínez, Raul, and Molina Clemente, Carlos
Subjects: Hardware_MEMORYSTRUCTURES, Supercomputadors, High performance computing, Supercomputers, Càlcul intensiu (Informàtica), Informàtica::Arquitectura de computadors [Àrees temàtiques de la UPC]
Abstract: In this work, we propose a new technique to improve the performance of hardware data prefetching. This technique is based on detecting periods of time and regions of code where the prefetcher is not working properly, thus not providing any speedup or even producing slowdown. Once these periods of time and regions of code are detected, the prefetcher may be switched off and later on, switched on. To efficiently implement such mechanism, we identify three orthogonal issues that must be addressed: the granularity of the code region, when the prefetcher is switched on, and when the prefetcher is switched off.
Published: 2015

4. Improving prefetching mechanisms for tiled CMP platforms

Author: Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Martínez Morais, Raul, Molina Clemente, Carlos, Torrents Lapuerta, Martí, Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Martínez Morais, Raul, Molina Clemente, Carlos, and Torrents Lapuerta, Martí
Abstract: Recently, high performance processor designs have evolved toward Chip-Multiprocessor (CMP) architectures to deal with instruction level parallelism limitations and, more important, to manage the power consumption that is becoming unaffordable due to the increased transistor count and clock frequency. At the present moment, this architecture, which implements multiple processing cores on a single die, is commercially available with up to twenty four processors on a single chip and there are roadmaps and research trends that suggest that number of cores will increase in the near future. The increasing on number of cores has converted the interconnection network in a key issue that will have significant impact on performance. Moreover, as the number of cores increases, tiled architectures are foreseen to provide a scalable solution to handle design complexity. Network-on-Chip (NoC) emerges as a solution to deal with growing on-chip wire delays. On the other hand, CMP designs are likely to be equipped with latency hiding techniques like prefetching in order to reduce the negative impact on performance that, otherwise, high cache miss rates would lead to. Unfortunately, the extra number of network messages that prefetching entails can drastically increase power consumption and the latency in the NoC. In this thesis, we do not develop a new prefetching technique for CMPs but propose improvements applicable to any of them. Specifically, we analyze the behavior of the prefetching in the CMPs and its impact to the interconnect. We propose several dynamic management techniques to improve the performance of the prefetching mechanism in the system. Furthermore, we identify the main problems when implementing prefetching in distributed memory systems like tiled architectures and propose directions to solve them. Finally, we propose several research lines to continue the work done in this thesis., Recentment l'arquitectura dels processadors d'altes prestacions ha evolucionat cap a processadors amb diversos nuclis per a concordar amb les limitacions del paral·lelisme a nivell d'instrucció i, mes important encara, per tractar el consum d'energia que ha esdevingut insostenible degut a l'increment de transistors i la freqüència de rellotge. Ara mateix, aquestes arquitectures, que implementes varis nuclis en un sol xip, estan a la venta amb mes de vint-i-quatre processadors en un sol xip i hi ha previsions que suggereixen que aquest nombre de nuclis creixerà en un futur pròxim. Aquest increment del nombre de nuclis, ha convertit la xarxa que els connecta en un punt clau que tindrà un impacte important en el seu rendiment. Una topologia de xarxa que sembla que serà capaç de proveir una solució escalable per aquestes arquitectures ha estat la topologia tile. Les xarxes en el xip (NoC) es presenten com la solució del increment de la latència dels cables del xip. Per altre banda, els dissenys de multiprocessadors seguiran disposant de tècniques de reducció de latència de memòria com el prefetch per tal de reduir l'impacte negatiu en rendiment que, altrament, tindríem degut als elevats temps de latència en fallades a memòria cache. Desafortunadament, el gran nombre de peticions destinades a prefetch, pot augmentar dràsticament la congestió a la xarxa i el consum d'energia. En aquesta tesi, no desenvolupem cap tècnica nova de prefetching, però proposem millores aplicables a qualsevol d'ells. Concretament analitzem el comportament del prefetching en multiprocessadors i el seu impacte a la xarxa. Proposem diverses tècniques de control dinàmic per millor el rendiment del prefetcher al sistema. A més, identifiquem els problemes principals d'implementar el prefetching en els sistemes de memòria distribuïts com els de les arquitectures tile i proposem línies d'investigació per solucionar-los. Finalment, també proposem diverses línies d'investigació per continuar amb el trebal, Postprint (published version)
Published: 2016

5. Comparative Study of Prefetching Mechanisms

Author: Torrents Lapuerta, Martí, Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, and González Colás, Antonio María
Subjects: Informàtica::Enginyeria del software [Àrees temàtiques de la UPC], Memòria jeràrquica (Informàtica), Memory hierarchy (Computer science), Prebúsqueda, Comparativa, Simulació
Published: 2009

6. Network aware performance evaluation of prefetching techniques in CMPs

Author: Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Universitat Politècnica de Catalunya. ARCO - Microarquitectura i Compiladors, Torrents Lapuerta, Martí, Martinez Morais, Raul, Molina Clemente, Carlos, Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Universitat Politècnica de Catalunya. ARCO - Microarquitectura i Compiladors, Torrents Lapuerta, Martí, Martinez Morais, Raul, and Molina Clemente, Carlos
Abstract: This study focuses on the importance of quantifying the effect of prefetching on the interconnection network of a multiprocessor chip. This kind of microarchitectural effects are often quantified using simulators. However, if prefetching traffic in a CMP (Chip MultiProcessor) system is to be accurately evaluated, simulators should simulate not only the memory hierarchy module and the multicore system, but also the network-on-chip. Unfortunately, no open-source simulator is able to evaluate all these elements at the same time. This paper describes how to develop a prefetching module for the gem5 CMP simulator and how to integrate this into the Ruby memory system. Moreover, by using the infrastructure developed in this study, this paper shows the importance of taking the network effect in prefetching-related studies into account, in order for accurate results to be obtained: not doing so may lead to mistaken conclusions. For this purpose, we have carried out a detailed analysis of the behavior of three different prefetching engines, providing not only the typical statistics for instructions per cycle and the miss rate, but also specific network and prefetching statistics., Peer Reviewed, Postprint (published version)
Published: 2014

7. Comparative Study of Prefetching Mechanisms

Author: Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, González Colás, Antonio María, Torrents Lapuerta, Martí, Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, González Colás, Antonio María, and Torrents Lapuerta, Martí
Published: 2009

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

7 results on '"Torrents Lapuerta, Martí"'

1. Exploiting vector code semantics for efficient data cache prefetching

2. Improving prefetching mechanisms for tiled CMP platforms

3. Improving the prefetching performance through code region profiling

4. Improving prefetching mechanisms for tiled CMP platforms

5. Comparative Study of Prefetching Mechanisms

6. Network aware performance evaluation of prefetching techniques in CMPs

7. Comparative Study of Prefetching Mechanisms

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Database

Publisher

7 results on '"Torrents Lapuerta, Martí"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources