82 results on '"Nemirovsky, Mario"'
Search Results
2. A Deep Learning Mapper (DLM) for Scheduling on Heterogeneous Systems
- Author
-
Nemirovsky, Daniel, Arkose, Tugberk, Markovic, Nikola, Nemirovsky, Mario, Unsal, Osman, Cristal, Adrian, Valero, Mateo, Barbosa, Simone Diniz Junqueira, Series editor, Chen, Phoebe, Series editor, Filipe, Joaquim, Series editor, Kotenko, Igor, Series editor, Sivalingam, Krishna M., Series editor, Washio, Takashi, Series editor, Yuan, Junsong, Series editor, Zhou, Lizhu, Series editor, Mocskos, Esteban, editor, and Nesmachnow, Sergio, editor
- Published
- 2018
- Full Text
- View/download PDF
3. Tackling IoT Ultra Large Scale Systems: Fog Computing in Support of Hierarchical Emergent Behaviors
- Author
-
Roca, Damian, Milito, Rodolfo, Nemirovsky, Mario, Valero, Mateo, Rahmani, Amir M., editor, Liljeberg, Pasi, editor, Preden, Jürgo-Sören, editor, and Jantsch, Axel, editor
- Published
- 2018
- Full Text
- View/download PDF
4. Evaluating the Success of Companies at University Science Parks: Key Performance and Innovation Indicators
- Author
-
Olvera, Claudia, primary, Piqué, Josep M., additional, Cortés, Ulises, additional, and Nemirovsky, Mario, additional
- Published
- 2019
- Full Text
- View/download PDF
5. Disaggregated Computing. An Evaluation of Current Trends for Datacentres
- Author
-
Meyer, Hugo, Sancho, José Carlos, Quiroga, Josue V., Zyulkyarov, Ferad, Roca, Damián, and Nemirovsky, Mario
- Published
- 2017
- Full Text
- View/download PDF
6. A Deep Learning Mapper (DLM) for Scheduling on Heterogeneous Systems
- Author
-
Nemirovsky, Daniel, primary, Arkose, Tugberk, additional, Markovic, Nikola, additional, Nemirovsky, Mario, additional, Unsal, Osman, additional, Cristal, Adrian, additional, and Valero, Mateo, additional
- Published
- 2017
- Full Text
- View/download PDF
7. Tackling IoT Ultra Large Scale Systems: Fog Computing in Support of Hierarchical Emergent Behaviors
- Author
-
Roca, Damian, primary, Milito, Rodolfo, additional, Nemirovsky, Mario, additional, and Valero, Mateo, additional
- Published
- 2017
- Full Text
- View/download PDF
8. An Energy-Efficient Design Paradigm for a Memory Cell Based on Novel Nanoelectromechanical Switches
- Author
-
Seyedi, Azam, primary, Karakostas, Vasileios, additional, Cosemans, Stefan, additional, Cristal, Adrian, additional, Nemirovsky, Mario, additional, and Unsal, Osman, additional
- Published
- 2017
- Full Text
- View/download PDF
9. SABES: statistical available bandwidth estimation from passive TCP measurements
- Author
-
Arcas Abella, Oriol, Montero Banegas, Diego Teodoro, Serral García, René, Ciaccia, Francesco, Romero, Iván, Nemirovsky, Mario, Arcas Abella, Oriol, Montero Banegas, Diego Teodoro, Serral García, René, Ciaccia, Francesco, Romero, Iván, and Nemirovsky, Mario
- Abstract
Estimating available network resources is fundamental when adapting the sending rate both at the application and transport layer. Traditional approaches either rely on active probing techniques or iteratively adapting the average sending rate, as is the case for modern TCP congestion control algorithms. In this paper, we propose a statistical method based on the inter-packet arrival time analysis of TCP acknowledgments to estimate a path available bandwidth. SABES first estimates the bottleneck link capacity exploiting the TCP flow slow start traffic patterns. Then, an heuristic based on the capacity estimation, provides an approximation of the end-to-end available bandwidth. Exhaustive experimentation on both simulations and real-world scenarios were conducted to validate our technique, and our results are promising. Furthermore, we train an artificial neural network to improve the estimation accuracy.
- Published
- 2020
10. SABES: Statistical Available Bandwidth EStimation from passive TCP measurements
- Author
-
Universitat Politècnica de Catalunya. Doctorat en Arquitectura de Computadors, Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Ciaccia, Francesco, Romero Ruiz, Ivan, Arcas Abella, Oriol, Montero Banegas, Diego Teodoro, Serral Gracià, René, Nemirovsky, Mario, Universitat Politècnica de Catalunya. Doctorat en Arquitectura de Computadors, Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Ciaccia, Francesco, Romero Ruiz, Ivan, Arcas Abella, Oriol, Montero Banegas, Diego Teodoro, Serral Gracià, René, and Nemirovsky, Mario
- Abstract
Estimating available network resources is fundamental when adapting the sending rate both at the application and transport layer. Traditional approaches either rely on active probing techniques or iteratively adapting the average sending rate, as is the case for modern TCP congestion control algorithms. In this paper, we propose a statistical method based on the inter-packet arrival time analysis of TCP acknowledgments to estimate a path available bandwidth. SABES first estimates the bottleneck link capacity exploiting the TCP flow slow start traffic patterns. Then, an heuristic based on the capacity estimation, provides an approximation of the end-to-end available bandwidth. Exhaustive experimentation on both simulations and real-world scenarios were conducted to validate our technique, and our results are promising. Furthermore, we train an artificial neural network to improve the estimation accuracy., This work was supported by the grant 2015DI023 as part of the Industrial PhD grants of AGAUR and Generalitat de Catalunya. Project co-financed by the Spanish Ministry of Ciencia Innovacion y Universidades with reference RTC-2017-6655-7 (FEDER)., Peer Reviewed, Postprint (author's final draft)
- Published
- 2020
11. Definition of new WAN paradigms enabled by smart measurements
- Author
-
Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Serral Gracià, René, Nemirovsky, Mario, Ciaccia, Francesco, Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Serral Gracià, René, Nemirovsky, Mario, and Ciaccia, Francesco
- Abstract
Nowadays massive amounts of data are being moved over the Internet thanks to data-hungry applications, Big Data, and multimedia content. Combined with a reduction in cost and augmented reliability for high-speed broadband access, the whole Internet infrastructure is facing new challenges especially when information crosses long geographical distances. That is the case for Wide Area Networks (WANs), which are typically traversed in enterprises with multi-site deployments. When a connection is established between end-points that are geographically distant with high latency and high bandwidth, data is flowing over a so-called Long Fat Network. Currently, transport protocols in end-points are not able to exploit the resources of such links, notably the most common TCP implementations still stuffer from design flaws that limit their efficiency. More recent developments still suffer from low fairness in resource sharing and lack of global visibility. We identify SD-WAN as an SDN use-case that can enable new transport protocols adoption, improving traffic behavior over WANs, without the need to modify the end-points. In this thesis, we explore new approaches to network measurements that will enable both end-points and SD-WAN edge routers, to gain visibility over the end-to-end network status. Such additional visibility promotes the development of smarter control mechanisms for network traffic. The preliminary study carried on comprises TCP behavior over WANs and existing methodologies to control its traffic patterns and enforce rate throttling. We also identify a specific use case that poses challenges for WAN scenarios: the Split TCP connections in a Performance Enhancing Proxy (PEP). New control mechanisms to improve resource utilization and fairness are defined in this project. Specifically, we propose a new approach called Receive Window Modulation (RWM) that allows edge-routers to control the sending rate of a TCP connection by modifying the window advertised by the r, Hoy en día, Internet mueve cantidades considerables de datos debido a aplicaciones que requieren muchos datos (Big Data). En combinación con una reducción en los costes y un aumento en la fiabilidad de los enlaces de acceso a banda ancha, la infraestructura de Internet tiene que hacer frente a nuevos retos, especialmente cuando la información tiene que atravesar grandes distancias geográficas. Esto es el caso de las Redes de Area Extendida (WANs), que típicamente forman parte de la infraestructura de empresas con distintas sedes y oficinas. Hoy en día, los protocoles de transporto en los puntos finales no son capaces de explotar los recursos de las WANs, las mas comunes siendo implementaciones de TCP, las cuales todavía sufren de fallos en sus diseños que limitan la eficiencia. Desarrollos TCP recientes todavía no garantizan una repartición equitativa de los recursos de red y faltan de visibilidad global. Identificamos SD-WAN como un caso de uso el cual puede facilitar la adopción de nuevos protocoles de transporte, mejorando el comportamiento del trafico de red sobre WANs, sin la necesidad de modificar los puntos finales. En esta tesis exploramos nuevas propuestas en el campo de las medidas de red, las cuales permiten tanto a puntos finales como a router de borde, de ganar visibilidad sobre el estado de la red. Dicha visibilidad añadida permite el desarrollo de mecanismos de control del trafico de red mas inteligentes. Identificamos un caso de uso especifico que pone retos en los escenarios WAN: las conexiones Split TCP en el caso de un Performance Enhancing Proxy (PEP). En el proyecto vienen definidos nuevos mecanismos que mejoran la explotación y repartición de los recursos de red. En concreto, proponemos un nuevo esquema llamado Receive Window Modulation (RWM), que permite a los routers de borde controlar la ratio de envío de una conexión TCP modificando la ventana de recepción declarada por el recibidor. Probamos que dicho controlador puede mejorar la eficienci, Postprint (published version)
- Published
- 2020
12. Evaluating University-Business Collaboration at Science Parks: a Business Perspective
- Author
-
Universitat Politècnica de Catalunya. Doctorat en Administració i Direcció d'Empreses, Universitat Politècnica de Catalunya. Departament de Ciències de la Computació, Universitat Politècnica de Catalunya. KEMLG - Grup d'Enginyeria del Coneixement i Aprenentatge Automàtic, Olvera Herrera, Claudia, Piqué, Josep Ma, Cortés García, Claudio Ulises, Nemirovsky, Mario, Universitat Politècnica de Catalunya. Doctorat en Administració i Direcció d'Empreses, Universitat Politècnica de Catalunya. Departament de Ciències de la Computació, Universitat Politècnica de Catalunya. KEMLG - Grup d'Enginyeria del Coneixement i Aprenentatge Automàtic, Olvera Herrera, Claudia, Piqué, Josep Ma, Cortés García, Claudio Ulises, and Nemirovsky, Mario
- Abstract
The evaluation of the companies’ performance at University Science Parks (SPs) becomes essential in identifying the needs of the companies and the feasibility of the University-Business Collaboration (ubc). The companies’ real needs are also of interest for universities and SPs, since they face the challenge of designing strategies that best help them to transfer knowledge more effectively. This research article focuses on Key Performance Indicators (kpis) in ubc, needs and business objectives of companies co-located at SPs in Spain and Mexico. This article (i) aims to identify the kpis in ubc used by co-located companies at SPs, and (ii) explore the kpis in ubc and critical success factors of SPs. This article focuses on the perspective of companies, with a secondary focus on the perspectives of SPs and universities. For this study, data was collected through online company surveys in Spain and Mexico., Postprint (published version)
- Published
- 2020
13. Advances in the Hierarchical Emergent Behaviors (HEB) approach to autonomous vehicles
- Author
-
Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Universitat Politècnica de Catalunya. CAP - Grup de Computació d'Altes Prestacions, Roca, Damian, Milito, Rodolfo, Nemirovsky, Mario, Valero Cortés, Mateo, Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Universitat Politècnica de Catalunya. CAP - Grup de Computació d'Altes Prestacions, Roca, Damian, Milito, Rodolfo, Nemirovsky, Mario, and Valero Cortés, Mateo
- Abstract
Widespread deployment of autonomous vehicles (AVs) presents formidable challenges in terms on handling scalability and complexity, particularly regarding vehicular reaction in the face of unforeseen corner cases. Hierarchical Emergent Behaviors (HEB) is a scalable architecture based on the concepts of emergent behaviors and hierarchical decomposition. It relies on a few simple but powerful rules to govern local vehicular interactions. Rather than requiring prescriptive programming of every possible scenario, HEB’s approach relies on global behaviors induced by the application of these local, well-understood rules. Our first two papers on HEB focused on a primal set of rules applied at the first hierarchical level. On the path to systematize a solid design methodology, this paper proposes additional rules for the second level, studies through simulations the resultant richer set of emergent behaviors, and discusses the communica-tion mechanisms between the different levels., Peer Reviewed, Postprint (author's final draft)
- Published
- 2020
14. HIRE: Hidden Inter-packet Red-shift Effect
- Author
-
Universitat Politècnica de Catalunya. Doctorat en Arquitectura de Computadors, Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Ciaccia, Francesco, Romero Ruiz, Ivan, Serral Gracià, René, Nemirovsky, Mario, Universitat Politècnica de Catalunya. Doctorat en Arquitectura de Computadors, Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Ciaccia, Francesco, Romero Ruiz, Ivan, Serral Gracià, René, and Nemirovsky, Mario
- Abstract
Over the years, different techniques have been proposed to detect bottleneck bandwidth and available bandwidth of an end-to-end path. However, to the author's knowledge, no work has been conducted on detecting which link or node on the path could be the narrow link. In this paper, we present a novel technique based on packet pairs dispersion analysis, whose objective is twofold: first, it allows to estimate the narrow link capacity using a new approach which takes into account both inter-packet time and packet propagation delay. Its second objective is to induce the specific hop in the end-to-end path which represents the narrow link. This is achieved by injecting packets trains with intermediate TTL-expiring packets which decrease the train rate when they cross the narrow link (red-shift effect). We validate our approach in simulations showing the tool robustness in very complex scenarios., This work was supported by the Industrial PhD grant 2015DI023 of AGAUR and Gencat and the project Efficient Smart Multi Connected Networks co-financed by the Spanish Ministry of Ciencia Innovacion y Universidades with reference RTC-2017-6655-7, The Spanish Agenda Estatal de Investigacin and the European Regional Development Fund (FEDER)., Peer Reviewed, Postprint (author's final draft)
- Published
- 2020
15. HIRE: Hidden Inter-packet Red-shift Effect
- Author
-
Ciaccia, Francesco, primary, Romero, Ivan, additional, Serral-Gracia, Rene, additional, and Nemirovsky, Mario, additional
- Published
- 2020
- Full Text
- View/download PDF
16. Design Space Exploration of High-Performance Parallel Architectures
- Author
-
Musoll, Enric, primary and Nemirovsky, Mario, additional
- Published
- 2020
- Full Text
- View/download PDF
17. Evaluating University-Business Collaboration at Science Parks: a Business Perspective
- Author
-
Olvera, Claudia, primary, Piqué, Josep M., additional, Cortés, Ulises, additional, and Nemirovsky, Mario, additional
- Published
- 2020
- Full Text
- View/download PDF
18. Advances in the Hierarchical Emergent Behaviors (HEB) Approach to Autonomous Vehicles
- Author
-
Roca, Damian, primary, Milito, Rodolfo, additional, Nemirovsky, Mario, additional, and Valero, Mateo, additional
- Published
- 2020
- Full Text
- View/download PDF
19. Improving TCP performance and reducing self-induced congestion with receive window modulation
- Author
-
Universitat Politècnica de Catalunya. Doctorat en Arquitectura de Computadors, Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Ciaccia, Francesco, Arcas Abella, Oriol, Montero Banegas, Diego Teodoro, Romero Ruiz, Ivan, Milito, Rodolfo, Serral Gracià, René, Nemirovsky, Mario, Universitat Politècnica de Catalunya. Doctorat en Arquitectura de Computadors, Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Ciaccia, Francesco, Arcas Abella, Oriol, Montero Banegas, Diego Teodoro, Romero Ruiz, Ivan, Milito, Rodolfo, Serral Gracià, René, and Nemirovsky, Mario
- Abstract
We present a control module for software edge routers called Receive Window Modulation - RWM. Its main objective is to mitigate what we define as self-induced congestion: the result of traffic emission patterns at the source that cause buffering and packet losses in any of the intermediate routers along the path between the connection's endpoints. The controller modifies the receiver's TCP advertised window to match the computed bandwidth-delay product, based on the connection round-trip time estimation and the bandwidth locally available at the edge router. The implemented controller does not need any endpoint modification, allowing it to be deployed in corporate edge routers, increasing visibility and control capabilities. This scheme, when used in real-world experiments with loss-based congestion control algorithms such as CUBIC, is shown to optimize access link utilization and per-connection goodput, and to reduce latency variability and packet losses., This work was supported by the grant 2015DI023 in the framework of the Industrial PhD grants of AGAUR and Generalitat de Catalunya., Peer Reviewed, Postprint (author's final draft)
- Published
- 2019
20. Improving TCP performance and reducing self-induced congestion with receive window modulation
- Author
-
Montero Banegas, Diego Teodoro, Nemirovsky, Mario Daniel, Serral Graciá, René, Romero, Ivan, Arcas Abella, Oriol, Milito, Rodolfo, Ciaccia, Francesco, Montero Banegas, Diego Teodoro, Nemirovsky, Mario Daniel, Serral Graciá, René, Romero, Ivan, Arcas Abella, Oriol, Milito, Rodolfo, and Ciaccia, Francesco
- Abstract
We present a control module for software edge routers called Receive Window Modulation - RWM. Its main objective is to mitigate what we define as self-induced congestion: the result of traffic emission patterns at the source that cause buffering and packet losses in any of the intermediate routers along the path between the connection's endpoints. The controller modifies the receiver's TCP advertised window to match the computed bandwidth-delay product, based on the connection round-trip time estimation and the bandwidth locally available at the edge router. The implemented controller does not need any endpoint modification, allowing it to be deployed in corporate edge routers, increasing visibility and control capabilities. This scheme, when used in real-world experiments with loss-based congestion control algorithms such as CUBIC, is shown to optimize access link utilization and per-connection goodput, and to reduce latency variability and packet losses.
- Published
- 2019
21. The effectiveness of knowledge and technology transfer through university-business collaboration in science parks
- Author
-
Universitat Politècnica de Catalunya. Departament d'Organització d'Empreses, Cortés García, Claudio Ulises, Nemirovsky, Mario, Olvera, Caludia, Universitat Politècnica de Catalunya. Departament d'Organització d'Empreses, Cortés García, Claudio Ulises, Nemirovsky, Mario, and Olvera, Caludia
- Abstract
Science and Technology Parks (STPs) facilitate the flow of knowledge and technology among universities; R&D institutions; companies and markets, and foster the creation and growth of innovation-based companies. Among the diversities of STPs, it is possible to identify two types: (i) Science Parks (SPs), which involve university shareholding and (ii) Technology Parks (TPs), which are not owned by universities. This study will take into account only SPs since they are closely linked to the university, and they are the bridge between a University and companies in the process of Knowledge and Technology Transfer (KTT). The evaluation of the firms' performance in Science Parks results determinant to identify the needs of the companies and the feasibility of the University-Business Collaboration (UBC). The firms' real needs also are of interest for Universities and Science parks, since they face the challenge of designing strategies that best help them to transfer the knowledge more effectively. While previous studies have been focused on tenants´ innovation performance on-Park and off-Park, very little research has taken into account the Parks heterogeneity that may affect the firms' performance. This research paper focuses on SPs in Spain and Mexico due to data availability. This paper (i) aims to identify the Key Performance Indicators (KPIs) in UBC used by co-located companies at SPs, and (ii) explore the performance measure (KPIs) in UBC and critical success factors of SPs. For this study, data was collected through fifty eight online company surveys in Spain and forty two in México. This empirical analysis uses fourteen semi-structured interviews, addressed to SPs directors in order to explore (KPIs) and success factors of SPs in both countries, Los Parques Científicos y Tecnológicos (PTS) facilitan el flujo de conocimiento y tecnología entre las universidades; Centros de Investigación; empresas y mercados, y fomentan la Creación y crecimiento de empresas basadas en la innovación. Entre las diversidades de STP, es posible identificar dos tipos: (i) Parques científicos (SP), el los cuales la universidad, tiene una participación accionaria y (ii) Parques Tecnológicos (TP), en los cuales las universidades tienen una participación mínima de acciones. Este estudio tomará en cuenta solo los SP ya que están estrechamente vinculados a la universidad, y son el puente entre una universidad y empresas en proceso de transferencia de conocimiento y tecnología. (KTT). La evaluación del desempeño de las empresas en los parques científicos es determinante para identificar las necesidades de las empresas y la viabilidad de la Colaboración Universidad-Empresa. (UBC). Las necesidades reales de las empresas también son de interés para Universidades, ya a que enfrentan el desafío de diseñar estrategias que les ayuden a transferir el conocimiento de una forma más eficaz. Mientras que estudios anteriores se han centrado en medir la innovacion de las empresas (on-park y off-park), muy poca investigación ha tenido en cuenta la heterogeneidad de los SP, que puede afectar el desempeño de las empresas. Este trabajo de investigación se centra en los SP en España y México por disponibilidad de datos. Este estudio (i) tiene como objetivo identificar los Key Perfofromance Indicators (KPI) en UBC utilizados por las empresas establecidas en los SP, y (ii) explorar las métricas (KPI) en UBC y factores críticos de éxito de los SP. Para este estudio,se enviaron encuestas en linea a nueve SP de México y España y se obtuvieron cincuenta y ocho encuestas de empresas en España y cuarenta y dos en México. Este análisis también utiliza investigación qualitativa, ( 14 entrevistas semi-estructuradas, dirigidas a directores de SP), para explorar (KPI), Postprint (published version)
- Published
- 2019
22. Evaluating University-Business Collaboration at Science Parks: a Business Perspective.
- Author
-
Olvera, Claudia, Piqué, Josep M., Cortés, Ulises, and Nemirovsky, Mario
- Subjects
BUSINESS parks ,KNOWLEDGE transfer ,RESEARCH parks ,COOPERATIVE research ,CRITICAL success factor ,KEY performance indicators (Management) - Abstract
Copyright of Triple Helix is the property of Brill Academic Publishers and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
- Published
- 2021
- Full Text
- View/download PDF
23. Evaluation of a Rack-Scale Disaggregated Memory Prototype for Cloud Data Centers
- Author
-
Quiroga, Josue V., primary, Torrents, Marti, additional, Sonmez, Nehir, additional, Theodoropoulos, Dimitris, additional, Zyulkyarov, Ferad, additional, and Nemirovsky, Mario, additional
- Published
- 2019
- Full Text
- View/download PDF
24. Improving TCP Performance and Reducing Self-Induced Congestion with Receive Window Modulation
- Author
-
Ciaccia, Francesco, primary, Arcas-Abella, Oriol, additional, Montero, Diego, additional, Romero, Ivan, additional, Milito, Rodolfo, additional, Serral-Gracia, Rene, additional, and Nemirovsky, Mario, additional
- Published
- 2019
- Full Text
- View/download PDF
25. Tackling IoT ultra large scale systems: Fog computing in support of hierarchical emergent behaviors
- Author
-
Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Barcelona Supercomputing Center, Universitat Politècnica de Catalunya. CAP - Grup de Computació d'Altes Prestacions, Roca, Damian, Milito, Rodolfo, Nemirovsky, Mario, Valero Cortés, Mateo, Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Barcelona Supercomputing Center, Universitat Politècnica de Catalunya. CAP - Grup de Computació d'Altes Prestacions, Roca, Damian, Milito, Rodolfo, Nemirovsky, Mario, and Valero Cortés, Mateo
- Abstract
The Internet of Things (IoT) marks a phase transition in the evolution of the Internet, distinguished by a massive connectivity and the interaction with the physical world. The organic evolution of IoT requires the consideration of three dimensions: scale, organization, and context. These dimensions are particularly relevant in Ultra Large Scale Systems (ULSS), of which autonomous vehicles is a prime example. Fog Computing is well positioned to support contextual awareness and communication, critical for ULSS. The design and orchestration of ULSS require fresh approaches, new organizing principles. A recent paper proposed Hierarchical Emergent Behaviors (HEB), an architecture that builds on established concepts of emergent behaviors and hierarchical decomposition and organization. HEB’s local rules induce emergent behaviors, i.e., useful behaviors not explicitly programmed. In this chapter we take a first step to validate HEB concepts through the study of two basic self-driven car “primitives”: exiting a platoon formation, and maneuvering in anticipation of obstacles beyond the range of on-board sensors. Fog nodes provide the critical contextual information required., Damian Roca work was supported by a Doctoral Scholarship provided by Fundación La Caixa. This work has been supported by the Spanish Government (Severo Ochoa grants SEV2015-0493) and by the Spanish Ministry of Science and Innovation (contracts TIN2015-65316-P)., Peer Reviewed, Postprint (author's final draft)
- Published
- 2018
26. A general guide to applying machine learning to computer architecture
- Author
-
Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Barcelona Supercomputing Center, Universitat Politècnica de Catalunya. CAP - Grup de Computació d'Altes Prestacions, Nemirovsky, Daniel, Arkose, Tugberk, Markovic, Nikola, Nemirovsky, Mario, Unsal, Osman Sabri, Cristal Kestelman, Adrián, Valero Cortés, Mateo, Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Barcelona Supercomputing Center, Universitat Politècnica de Catalunya. CAP - Grup de Computació d'Altes Prestacions, Nemirovsky, Daniel, Arkose, Tugberk, Markovic, Nikola, Nemirovsky, Mario, Unsal, Osman Sabri, Cristal Kestelman, Adrián, and Valero Cortés, Mateo
- Abstract
The resurgence of machine learning since the late 1990s has been enabled by significant advances in computing performance and the growth of big data. The ability of these algorithms to detect complex patterns in data which are extremely difficult to achieve manually, helps to produce effective predictive models. Whilst computer architects have been accelerating the performance of machine learning algorithms with GPUs and custom hardware, there have been few implementations leveraging these algorithms to improve the computer system performance. The work that has been conducted, however, has produced considerably promising results. The purpose of this paper is to serve as a foundational base and guide to future computer architecture research seeking to make use of machine learning models for improving system efficiency. We describe a method that highlights when, why, and how to utilize machine learning models for improving system performance and provide a relevant example showcasing the effectiveness of applying machine learning in computer architecture. We describe a process of data generation every execution quantum and parameter engineering. This is followed by a survey of a set of popular machine learning models. We discuss their strengths and weaknesses and provide an evaluation of implementations for the purpose of creating a workload performance predictor for different core types in an x86 processor. The predictions can then be exploited by a scheduler for heterogeneous processors to improve the system throughput. The algorithms of focus are stochastic gradient descent based linear regression, decision trees, random forests, artificial neural networks, and k-nearest neighbors., This work has been supported by the European Research Council (ERC) Advanced Grant RoMoL (Grant Agreemnt 321253) and by the Spanish Ministry of Science and Innovation (contract TIN 2015-65316P)., Peer Reviewed, Postprint (published version)
- Published
- 2018
27. Analysis and simulation of emergent architectures for internet of things
- Author
-
Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Nemirovsky, Mario, Valero Cortés, Mateo, Roca Marí, Damián, Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Nemirovsky, Mario, Valero Cortés, Mateo, and Roca Marí, Damián
- Abstract
The Internet of Things (IoT) promises a plethora of new services and applications supported by a wide range of devices that includes sensors and actuators. To reach its potential IoT must break down the silos that limit applications' interoperability and hinder their manageability. These silos' result from existing deployment techniques where each vendor set up its own infrastructure, duplicating the hardware and increasing the costs. Fog Computing can serve as the underlying platform to support IoT applications thus avoiding the silos'. Each application becomes a system formed by IoT devices (i.e. sensors, actuators), an edge infrastructure (i.e. Fog Computing) and the Cloud. In order to improve several aspects of human lives, different systems can interact to correlate data obtaining functionalities not achievable by any of the systems in isolation. Then, we can analyze the IoT as a whole system rather than a conjunction of isolated systems. Doing so leads to the building of Ultra-Large Scale Systems (ULSS), an extension of the concept of Systems of Systems (SoS), in several verticals including Autonomous Vehicles, Smart Cities, and Smart Grids. The scope of ULSS is large in the number of things and complex in the variety of applications, volume of data, and diversity of communication patterns. To handle this scale and complexity in this thesis we propose Hierarchical Emergent Behaviors (HEB), a paradigm that builds on the concepts of emergent behavior and hierarchical organization. Rather than explicitly program all possible situations in the vast space of ULSS scenarios, HEB relies on emergent behaviors induced by local rules that define the interactions of the "things" between themselves and also with their environment. We discuss the modifications to classical IoT architectures required by HEB, as well as the new challenges. Once these challenges such as scalability and manageability are addressed, we can illustrate HEB's usefulness dealing with an IoT-based U, El Internet de las Cosas (IoT) promete una plétora de nuevos servicios y aplicaciones habilitadas por una amplia gama de dispositivos que incluye sensores y actuadores. Para alcanzar su potencial, IoT debe superar los silos que limitan la interoperabilidad de las aplicaciones y dificultan su administración. Estos silos son el resultado de las técnicas de implementación existentes en las que cada proveedor instala su propia infraestructura y duplica el hardware, incrementando los costes. Fog Computing puede servir como la plataforma subyacente que soporte aplicaciones del IoT evitando así los silos. Cada aplicación se convierte en un sistema formado por dispositivos IoT (por ejemplo sensores y actuadores), una infraestructura (como Fog Computing) y la nube. Con el fin de mejorar varios aspectos de la vida humana, diferentes sistemas pueden interactuar para correlacionar datos obteniendo funcionalidades que no pueden lograrse por ninguno de los sistemas de forma aislada. Entonces, podemos analizar el IoT como un único sistema en lugar de una conjunción de sistemas aislados. Esta perspectiva conduce a la construcción de Ultra-Large Scale Systems (ULSS), una extensión del concepto de Systems of Systems (SoS), en varios verticales, incluidos los vehículos autónomos, Smart Cities y Smart Grids. El alcance de ULSS es vasto debido a la cantidad de dispositivos y complejo en la variedad de aplicaciones, volumen de datos y diversidad de patrones de comunicación. Para manejar esta escala y complejidad, en esta tesis proponemos Hierarchical Emergent Behaviors (HEB), un paradigma que se basa en los conceptos de comportamientos emergente y organización jerárquica. En lugar de programar explícitamente todas las situaciones posibles en el vasto espacio de escenarios presentes en los ULSS, HEB se basa en comportamientos emergentes inducidos por reglas locales que definen las interacciones de las "cosas" entre ellas y también con su entorno. Discutimos las modificaciones a las arquit, Postprint (published version)
- Published
- 2018
28. iQ: an efficient and flexible queue-based simulation framework
- Author
-
Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Barcelona Supercomputing Center, Universitat Politècnica de Catalunya. CAP - Grup de Computació d'Altes Prestacions, Roca, Damian, Nemirovsky, Daniel, Casas, Marc, Moreto Planas, Miquel, Valero Cortés, Mateo, Nemirovsky, Mario, Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Barcelona Supercomputing Center, Universitat Politècnica de Catalunya. CAP - Grup de Computació d'Altes Prestacions, Roca, Damian, Nemirovsky, Daniel, Casas, Marc, Moreto Planas, Miquel, Valero Cortés, Mateo, and Nemirovsky, Mario
- Abstract
Conventional system simulators are readily used by computer architects to design and evaluate their processor designs. These simulators provide reasonable levels of accuracy and execution detail but suffer from long simulation latencies and increased implementation complexity. In this work we propose iQ, a queue-based modeling technique that targets design space exploration and optimization studies at the core component level. iQ emulates processor elements by abstracting the implementation details into modular components composed of queue structures, delay parameters, probabilistic driven message generation and event control. Its easy reconfigurability makes iQ a highly flexible and powerful processor simulator. We have used iQ to build an Ivy Bridge and a Core 2 Duo processor model and have validated them against real hardware running SPEC CPU2006 Int achieving average error rates of 9.55% and 8.93%., The authors would like to thank Mauricio Breternitz, Rodolfo Milito, and Vasilis Karakostas for their helpful reviews. Damian Roca work was supported by a Doctoral Scholarship provided by Fundación La Caixa. This work has been supported by the Spanish Government (Severo Ochoa grants SEV2015-0493) and by the Spanish Ministry of Science and Innovation (contracts TIN2015-65316-P)., Peer Reviewed, Postprint (author's final draft)
- Published
- 2017
29. Fog function virtualization: A flexible solution for IoT applications
- Author
-
Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Barcelona Supercomputing Center, Universitat Politècnica de Catalunya. CAP - Grup de Computació d'Altes Prestacions, Roca, Damian, Quiroga, Josue V., Valero Cortés, Mateo, Nemirovsky, Mario, Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Barcelona Supercomputing Center, Universitat Politècnica de Catalunya. CAP - Grup de Computació d'Altes Prestacions, Roca, Damian, Quiroga, Josue V., Valero Cortés, Mateo, and Nemirovsky, Mario
- Abstract
The Internet of Things applications must carefully assess certain crucial factors such as the real-time and largely distributed nature of the “things”. Fog Computing provides an architecture to satisfy those requirements through nodes located from near the “things” till the edge. The problem comes with the integration of the Fog nodes into current infrastructures. This process requires the development of complex software solutions and prevents Fog growth. In this paper we propose three innovations to enhance Fog: (i) a new orchestration policy, (ii) the creation of constellations of nodes, and (iii) Fog Function Virtualization (FFV). All together will complement Fog to reach its true potential as a generic scalable platform, running multiple IoT applications simultaneously. Deploying a new service is reduced to the development of the application code, fact that brings the democratization of the Fog Computing paradigm through ease of deployment and cost reduction., The authors thanks Rodolfo Milito for his insightful comments and revisions. Damian Roca work was supported by a Doctoral Scholarship provided by Fundación La Caixa. Josue V. Quiroga work was supported by a Doctoral Scholarship provided by the Mexican National Council of Science and Technology (CONACyT). This work has been supported by the Spanish Government (Severo Ochoa grants SEV2015-0493) and by the Spanish Ministry of Science and Innovation (contracts TIN2015-65316-P)., Peer Reviewed, Postprint (author's final draft)
- Published
- 2017
30. Disaggregated Computing. An Evaluation of Current Trends for Datacentres
- Author
-
Barcelona Supercomputing Center, Meyer, Hugo, Sancho, Jose C., Quiroga, Josue V., Zyulkyarov, Ferad, Roca, Damian, Nemirovsky, Mario, Barcelona Supercomputing Center, Meyer, Hugo, Sancho, Jose C., Quiroga, Josue V., Zyulkyarov, Ferad, Roca, Damian, and Nemirovsky, Mario
- Abstract
Next generation data centers will likely be based on the emerging paradigm of disaggregated function-blocks-as-a-unit departing from the current state of mainboard-as-a-unit. Multiple functional blocks or bricks such as compute, memory and peripheral will be spread through the entire system and interconnected together via one or multiple high speed networks. The amount of memory available will be very large distributed among multiple bricks. This new architecture brings various benefits that are desirable in today’s data centers such as fine-grained technology upgrade cycles, fine-grained resource allocation, and access to a larger amount of memory and accelerators. An analysis of the impact and benefits of memory disaggregation is presented in this paper. One of the biggest challenges when analyzing these architectures is that memory accesses should be modeled correctly in order to obtain accurate results. However, modeling every memory access would generate a high overhead that can make the simulation unfeasible for real data center applications. A model to represent and analyze memory disaggregation has been designed and a statistics-based queuing-based full system simulator was developed to rapidly and accurately analyze applications performance in disaggregated systems. With a mean error of 10%, simulation results pointed out that the network layers may introduce overheads that degrade applications’ performance up to 66%. Initial results also suggest that low memory access bandwidth may degrade up to 20% applications’ performance., This project has received funding from the European Unions Horizon 2020 research and innovation programme under grant agreement No 687632 (dReDBox project) and TIN2015-65316-P - Computacion de Altas Prestaciones VII., Peer Reviewed, Postprint (published version)
- Published
- 2017
31. A Machine Learning Approach for Performance Prediction and Scheduling on Heterogeneous CPUs
- Author
-
Nemirovsky, Daniel, primary, Arkose, Tugberk, additional, Markovic, Nikola, additional, Nemirovsky, Mario, additional, Unsal, Osman, additional, and Cristal, Adrian, additional
- Published
- 2017
- Full Text
- View/download PDF
32. iQ: An Efficient and Flexible Queue-Based Simulation Framework
- Author
-
Roca, Damian, primary, Nemirovsky, Daniel, additional, Casas, Marc, additional, Moreto, Miquel, additional, Valero, Mateo, additional, and Nemirovsky, Mario, additional
- Published
- 2017
- Full Text
- View/download PDF
33. Fog Function Virtualization: A flexible solution for IoT applications
- Author
-
Roca, Damian, primary, Quiroga, Josue V., additional, Valero, Mateo, additional, and Nemirovsky, Mario, additional
- Published
- 2017
- Full Text
- View/download PDF
34. Scalability of Broadcast Performance in Wireless Network-on-Chip
- Author
-
Abadal, Sergi, primary, Mestres, Albert, additional, Nemirovsky, Mario, additional, Lee, Heekwan, additional, Gonzalez, Antonio, additional, Alarcon, Eduard, additional, and Cabellos-Aparicio, Albert, additional
- Published
- 2016
- Full Text
- View/download PDF
35. Emergent Behaviors in the Internet of Things: The Ultimate Ultra-Large-Scale System
- Author
-
Roca, Damian, primary, Nemirovsky, Daniel, additional, Nemirovsky, Mario, additional, Milito, Rodolfo, additional, and Valero, Mateo, additional
- Published
- 2016
- Full Text
- View/download PDF
36. HFOG: Small versus Big Data
- Author
-
Ferrer-Roca, Olga, Roca, Damian, Nemirovsky, Mario, and Milito, Rodolfo
- Published
- 2015
- Full Text
- View/download PDF
37. Scalability of broadcast performance in wireless network-on-chip
- Author
-
Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Universitat Politècnica de Catalunya. Departament d'Enginyeria Electrònica, Universitat Politècnica de Catalunya. CBA - Sistemes de Comunicacions i Arquitectures de Banda Ampla, Universitat Politècnica de Catalunya. ARCO - Microarquitectura i Compiladors, Universitat Politècnica de Catalunya. EPIC - Energy Processing and Integrated Circuits, Abadal Cavallé, Sergi, Mestres Sugrañes, Albert, Nemirovsky, Mario, Lee, Heekwan, González Colás, Antonio María, Alarcón Cot, Eduardo José, Cabellos Aparicio, Alberto, Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Universitat Politècnica de Catalunya. Departament d'Enginyeria Electrònica, Universitat Politècnica de Catalunya. CBA - Sistemes de Comunicacions i Arquitectures de Banda Ampla, Universitat Politècnica de Catalunya. ARCO - Microarquitectura i Compiladors, Universitat Politècnica de Catalunya. EPIC - Energy Processing and Integrated Circuits, Abadal Cavallé, Sergi, Mestres Sugrañes, Albert, Nemirovsky, Mario, Lee, Heekwan, González Colás, Antonio María, Alarcón Cot, Eduardo José, and Cabellos Aparicio, Alberto
- Abstract
Networks-on-Chip (NoCs) are currently the paradigm of choice to interconnect the cores of a chip multiprocessor. However, conventional NoCs may not suffice to fulfill the on-chip communication requirements of processors with hundreds or thousands of cores. The main reason is that the performance of such networks drops as the number of cores grows, especially in the presence of multicast and broadcast traffic. This not only limits the scalability of current multiprocessor architectures, but also sets a performance wall that prevents the development of architectures that generate moderate-to-high levels of multicast. In this paper, a Wireless Network-on-Chip (WNoC) where all cores share a single broadband channel is presented. Such design is conceived to provide low latency and ordered delivery for multicast/broadcast traffic, in an attempt to complement a wireline NoC that will transport the rest of communication flows. To assess the feasibility of this approach, the network performance of WNoC is analyzed as a function of the system size and the channel capacity, and then compared to that of wireline NoCs with embedded multicast support. Based on this evaluation, preliminary results on the potential performance of the proposed hybrid scheme are provided, together with guidelines for the design of MAC protocols for WNoC., Peer Reviewed, Postprint (published version)
- Published
- 2016
38. Thread assignment in multicore/multithreaded processors: A statistical approach
- Author
-
Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Barcelona Supercomputing Center, Universitat Politècnica de Catalunya. CAP - Grup de Computació d'Altes Prestacions, Radojković, Petar, Carpenter, Paul Matthew, Moretó Planas, Miquel, Cakarevic, Vladimir, Verdú Mulà, Javier, Pajuelo González, Manuel Alejandro, Cazorla Almeida, Francisco Javier, Nemirovsky, Mario, Valero Cortés, Mateo, Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Barcelona Supercomputing Center, Universitat Politècnica de Catalunya. CAP - Grup de Computació d'Altes Prestacions, Radojković, Petar, Carpenter, Paul Matthew, Moretó Planas, Miquel, Cakarevic, Vladimir, Verdú Mulà, Javier, Pajuelo González, Manuel Alejandro, Cazorla Almeida, Francisco Javier, Nemirovsky, Mario, and Valero Cortés, Mateo
- Abstract
© 2015 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works., The introduction of multicore/multithreaded processors, comprised of a large number of hardware contexts (virtual CPUs) that share resources at multiple levels, has made process scheduling, in particular assignment of running threads to available hardware contexts, an important aspect of system performance. Nevertheless, thread assignment of applications running on state-of-the art processors is an NP-complete problem. Over the years, numerous studies have proposed heuristic-based algorithms for thread assignment. Since the thread assignment problem is intractable, it is in general impossible to know the performance of the optimal assignment, so the room for improvement of a given algorithm is also unknown. It is therefore hard to decide whether to invest more effort and time to improve an algorithm that may already be close to optimal. In this paper, we present a statistical approach to the thread assignment problem. First, we present a method that predicts the performance of the optimal thread assignment, based on the observed performance of each thread assignment in a random sample. The method is based on Extreme Value Theory (EVT), a branch of statistics that analyses extreme deviations from the population mean. We also propose sample pruning, a method that significantly reduces the time required to apply the statistical method by reducing the number of candidate solutions that need to be measured. Finally, we show that, if no suitable heuristic-based algorithm is available, a sample of several thousand random thread assignments is enough to obtain, with high confidence, an assignment with performance close to optimal. The presented approach is architecture and application independent, and it can be used to address the thread assignment problem in various domains. It is especially well suited for systems in which the workload seldom changes. An example is network systems, which typically provide a constant set of services that are known in advance, with network, This work has been supported by the Spanish Ministry of Science and Innovation under grant TIN2012-34557, the HiPEAC Network of Excellence, and by the European Research Council under the European Union’s 7th FP, ERC Grant Agreement number 321253. Miquel Moreto has been partially supported by the Ministry of Economy and Competitiveness under Juan de la Cierva postdoctoral fellowship number JCI-2012-15047., Peer Reviewed, Postprint (author's final draft)
- Published
- 2016
39. Thread Assignment in Multicore/Multithreaded Processors: A Statistical Approach
- Author
-
Radojkovic, Petar, Carpenter, Paul M., Moreto, Miquel, Cakarevic, Vladimir, Verdu, Javier, Pajuelo, Alex, Cazorla, Francisco J., Nemirovsky, Mario, Valero, Mateo, Radojkovic, Petar, Carpenter, Paul M., Moreto, Miquel, Cakarevic, Vladimir, Verdu, Javier, Pajuelo, Alex, Cazorla, Francisco J., Nemirovsky, Mario, and Valero, Mateo
- Abstract
The introduction of multicore/multithreaded processors, comprised of a large number of hardware contexts (virtual CPUs) that share resources at multiple levels, has made process scheduling, in particular assignment of running threads to available hardware contexts, an important aspect of system performance. Nevertheless, thread assignment of applications running on state-of-the art processors is an NP-complete problem. Over the years, numerous studies have proposed heuristic-based algorithms for thread assignment. Since the thread assignment problem is intractable, it is in general impossible to know the performance of the optimal assignment, so the room for improvement of a given algorithm is also unknown. It is therefore hard to decide whether to invest more effort and time to improve an algorithm that may already be close to optimal. In this paper, we present a statistical approach to the thread assignment problem. First, we present a method that predicts the performance of the optimal thread assignment, based on the observed performance of each thread assignment in a random sample. The method is based on Extreme Value Theory (EVT), a branch of statistics that analyses extreme deviations from the population mean. We also propose sample pruning, a method that significantly reduces the time required to apply the statistical method by reducing the number of candidate solutions that need to be measured. Finally, we show that, if no suitable heuristic-based algorithm is available, a sample of several thousand random thread assignments is enough to obtain, with high confidence, an assignment with performance close to optimal. The presented approach is architecture and application independent, and it can be used to address the thread assignment problem in various domains. It is especially well suited for systems in which the workload seldom changes. An example is network systems, which typically provide a constant set of services that are known in advance, with network
- Published
- 2016
40. Emergent behaviors in the Internet of things: The ultimate ultra-large-scale system
- Author
-
Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Barcelona Supercomputing Center, Universitat Politècnica de Catalunya. CAP - Grup de Computació d'Altes Prestacions, Roca, Damian, Nemirovsky, Daniel, Nemirovsky, Mario, Milito, Rodolfo, Valero Cortés, Mateo, Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Barcelona Supercomputing Center, Universitat Politècnica de Catalunya. CAP - Grup de Computació d'Altes Prestacions, Roca, Damian, Nemirovsky, Daniel, Nemirovsky, Mario, Milito, Rodolfo, and Valero Cortés, Mateo
- Abstract
To reach its potential, the Internet of Things (IoT) must break down the silos that limit applications' interoperability and hinder their manageability. Doing so leads to the building of ultra-large-scale systems (ULSS) in several areas, including autonomous vehicles, smart cities, and smart grids. The scope of ULSS is both large and complex. Thus, the authors propose Hierarchical Emergent Behaviors (HEB), a paradigm that builds on the concepts of emergent behavior and hierarchical organization. Rather than explicitly programming all possible decisions in the vast space of ULSS scenarios, HEB relies on the emergent behaviors induced by local rules at each level of the hierarchy. The authors discuss the modifications to classical IoT architectures required by HEB, as well as the new challenges. They also illustrate the HEB concepts in reference to autonomous vehicles. This use case paves the way to the discussion of new lines of research., Damian Roca work was supported by a Doctoral Scholarship provided by Fundación La Caixa. This work has been supported by the Spanish Government (Severo Ochoa grants SEV2015-0493) and by the Spanish Ministry of Science and Innovation (contracts TIN2015-65316-P)., Peer Reviewed, Postprint (author's final draft)
- Published
- 2016
41. Scalability of Broadcast Performance in Wireless Network-on-Chip
- Author
-
Barcelona Supercomputing Center, Abadal Cavallé, Sergi, Mestres Sugrañes, Albert, Nemirovsky, Mario, Lee, Heekwan, Gonzalez, Antonio, Alarcón Cot, Eduardo José, Cabellos Aparicio, Alberto, Barcelona Supercomputing Center, Abadal Cavallé, Sergi, Mestres Sugrañes, Albert, Nemirovsky, Mario, Lee, Heekwan, Gonzalez, Antonio, Alarcón Cot, Eduardo José, and Cabellos Aparicio, Alberto
- Abstract
Networks-on-Chip (NoCs) are currently the paradigm of choice to interconnect the cores of a chip multiprocessor. However, conventional NoCs may not suffice to fulfill the on-chip communication requirements of processors with hundreds or thousands of cores. The main reason is that the performance of such networks drops as the number of cores grows, especially in the presence of multicast and broadcast traffic. This not only limits the scalability of current multiprocessor architectures, but also sets a performance wall that prevents the development of architectures that generate moderate-to-high levels of multicast. In this paper, a Wireless Network-on-Chip (WNoC) where all cores share a single broadband channel is presented. Such design is conceived to provide low latency and ordered delivery for multicast/broadcast traffic, in an attempt to complement a wireline NoC that will transport the rest of communication flows. To assess the feasibility of this approach, the network performance of WNoC is analyzed as a function of the system size and the channel capacity, and then compared to that of wireline NoCs with embedded multicast support. Based on this evaluation, preliminary results on the potential performance of the proposed hybrid scheme are provided, together with guidelines for the design of MAC protocols for WNoC., The authors gratefully acknowledge support from INTEL’s Doctoral Student Honor Program, as well as from Samsung’s Global Research Outreach (GRO) program. This work has been also partially supported by the Catalan Government through a FI-AGAUR grant and by the Spanish State Ministry of Economy and Competitiveness under grant aid PCIN-2015-012., Peer Reviewed, Postprint (author's final draft)
- Published
- 2016
42. Improving the performance and energy-efficiency of virtual memory
- Author
-
Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Nemirovsky, Mario, Ünsal, Osman, Cristal Kestelman, Adrián, Karakostas, Vasileios, Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Nemirovsky, Mario, Ünsal, Osman, Cristal Kestelman, Adrián, and Karakostas, Vasileios
- Abstract
Virtual memory improves programmer productivity, enhances process security, and increases memory utilization. However, virtual memory requires an address translation from the virtual to the physical address space on every memory operation. Page-based implementations of virtual memory divide physical memory into fixed size pages, and use a per-process page table to map virtual pages to physical pages. The hardware key component for accelerating address translation is the Translation Lookaside Buffer (TLB), that holds recently used mappings from the virtual to the physical address space. However, address translation still incurs high (i) performance overheads due to costly page table walks after TLB misses, and (ii) energy overheads due to frequent TLB lookups on every memory operation. This thesis quantifies these overheads and proposes techniques to mitigate them. In this thesis we argue that fixed size page-based approaches for address translation exhibit limited potential for improving TLB performance because they increase the TLB reach by a fixed amount. To overcome the limitations of such approaches, we introduce the concept of range translations and we show how they can significantly improve the performance and energy-efficiency of address translation. We first comprehensively quantify the address translation performance overhead on a collection of emerging scale-out applications. We show that address translation accounts for up to 16% of the total execution time. We find that huge pages may improve the application performance by reducing the time spent in page walks, enabling better exploitation of the available execution resources. However, the limited hardware support for huge pages in combination with the workloads' low memory locality leave ample space for performance optimizations. To reduce the performance overheads of address translation, we propose Redundant Memory Mappings (RMM). RMM provides an efficient alternative representation of many virtual-to, La memoria virtual aumenta la productividad del programador, provee seguridad a los procesos e incrementa la utilización de la memoria. No obstante, la memoria virtual requiere de una traducción de direcciones entre los espacios de direcciones virtual y físico en cada operación de memoria. La implementación de la memoria virtual paginada divide la memoria física en páginas de tamaño fijo. El principal componente para acelerar la traducción de direcciones es la TLB (Translation Lookaside Buffer). Sin embargo, la traducción de direcciones tiene un alto coste en el rendimiento, por la necesidad de buscar en la tabla de páginas después de un fallo de TLB, y por el coste energético por las frecuentes búsquedas en la TLB (una por cada operación de memoria). En esta tesis defendemos que los mecanismos de traducción basados en páginas tienen un potencial limitado para aumentar el rendimiento de la TLB. Principalmente porque solo se puede aumentar en una cantidad limitada el conjunto de direcciones que la TLB puede traducir. Para superar esta limitaciones, introducimos el concepto de traducciones por rangos y mostramos como este mecanismo puede mejorar significativamente el rendimiento y la eficiencia energética en la traducción de direcciones. Primero, cuantificamos la pérdida de rendimiento debido a la traducción en aplicaciones emergentes que escalan bien al agregar más procesadores. Mostramos que en estas aplicaciones la traducción de direcciones es responsable de hasta el 16% del tiempo de ejecución. Además, también mostramos que las páginas grandes pueden mejorar el rendimiento de las aplicaciones, permitiendo un mejor uso de los recursos disponibles. Sin embargo, el limitado soporte del hardware para páginas grandes, combinado con cargas de trabajo con poca localidad, nos deja mucho espacio para la optimización. Para reducir los costes de rendimiento de la traducción de direcciones, proponemos RMM (Redundant Memory Mappings). RMM esta basado en rangos de páginas y ofr, Postprint (published version)
- Published
- 2016
43. Range Translations for Fast Virtual Memory
- Author
-
Gandhi, Jayneel, primary, Karakostas, Vasileios, additional, Ayar, Furkan, additional, Cristal, Adrian, additional, Hill, Mark D., additional, McKinley, Kathryn S., additional, Nemirovsky, Mario, additional, Swift, Michael M., additional, and Unsal, Osman S., additional
- Published
- 2016
- Full Text
- View/download PDF
44. Energy-efficient address translation
- Author
-
Karakostas, Vasileios, primary, Gandhi, Jayneel, additional, Cristal, Adrian, additional, Hill, Mark D., additional, McKinley, Kathryn S., additional, Nemirovsky, Mario, additional, Swift, Michael M., additional, and Unsal, Osman S., additional
- Published
- 2016
- Full Text
- View/download PDF
45. Thread Assignment in Multicore/Multithreaded Processors: A Statistical Approach
- Author
-
Radojkovic, Petar, primary, Carpenter, Paul M., additional, Moreto, Miquel, additional, Cakarevic, Vladimir, additional, Verdu, Javier, additional, Pajuelo, Alex, additional, Cazorla, Francisco J., additional, Nemirovsky, Mario, additional, and Valero, Mateo, additional
- Published
- 2016
- Full Text
- View/download PDF
46. Broadcast-enabled massive multicore architectures: a wireless RF approach
- Author
-
Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Universitat Politècnica de Catalunya. Departament d'Enginyeria Electrònica, Universitat Politècnica de Catalunya. CBA - Sistemes de Comunicacions i Arquitectures de Banda Ampla, Universitat Politècnica de Catalunya. EPIC - Energy Processing and Integrated Circuits, Abadal Cavallé, Sergi, Sheinman, Benny, Katz, Oded, Markish, Ofer, Elad, Danny, Fournier, Yvan, Roca, Damian, Hanzich, Mauricio, Houzeaux, Guillaume, Nemirovsky, Mario, Alarcón Cot, Eduardo José, Cabellos Aparicio, Alberto, Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Universitat Politècnica de Catalunya. Departament d'Enginyeria Electrònica, Universitat Politècnica de Catalunya. CBA - Sistemes de Comunicacions i Arquitectures de Banda Ampla, Universitat Politècnica de Catalunya. EPIC - Energy Processing and Integrated Circuits, Abadal Cavallé, Sergi, Sheinman, Benny, Katz, Oded, Markish, Ofer, Elad, Danny, Fournier, Yvan, Roca, Damian, Hanzich, Mauricio, Houzeaux, Guillaume, Nemirovsky, Mario, Alarcón Cot, Eduardo José, and Cabellos Aparicio, Alberto
- Abstract
Broadcast traditionally has been regarded as a prohibitive communication transaction in multiprocessor environments. Nowadays, such a constraint largely drives the design of architectures and algorithms all-pervasive in diverse computing domains, directly and indirectly leading to diminishing performance returns as the many-core era is approaching. Novel interconnect technologies could help revert this trend by offering, among others, improved broadcast support, even in large-scale chip multiprocessors. This article outlines the prospects of wireless on-chip communication technologies pointing toward low-latency (a few cycles) and energy-efficient broadcast (a few picojoules per bit). It also discusses the challenges and potential impact of adopting these technologies as key enablers of unconventional hardware architectures and algorithmic approaches, in the pathway of significantly improving the performance, energy efficiency, scalability, and programmability of many-core chips., Peer Reviewed, Postprint (author's final draft)
- Published
- 2015
47. Virtualized security at the network edge: a user-centric approach
- Author
-
Kuusijarvi, Jarkko, Sassu, Roberto, Montero Banegas, Diego Teodoro, Lioy, Antonio, Basile, Cataldo, Serral Gracià, René, Risso, Fulvio, Ciaccia, Francesco, Jacquin, Ludovic, Georgiades, Michael, Shaw, Adrian, Charalambides, Savvas, Bosco, Francesca, Nemirovsky, Mario, Yannuzzi,, Marcelo, Pastor, Antonio, Kuusijarvi, Jarkko, Sassu, Roberto, Montero Banegas, Diego Teodoro, Lioy, Antonio, Basile, Cataldo, Serral Gracià, René, Risso, Fulvio, Ciaccia, Francesco, Jacquin, Ludovic, Georgiades, Michael, Shaw, Adrian, Charalambides, Savvas, Bosco, Francesca, Nemirovsky, Mario, Yannuzzi,, Marcelo, and Pastor, Antonio
- Abstract
The current device-centric protection model against security threats has serious limitations. On one hand, the proliferation of user terminals such as smartphones, tablets, notebooks, smart TVs, game consoles, and desktop computers makes it extremely difficult to achieve the same level of protection regardless of the device used. On the other hand, when various users share devices (e.g., parents and kids using the same devices at home), the setup of distinct security profiles, policies, and protection rules for the different users of a terminal is far from trivial. In light of this, this article advocates for a paradigm shift in user protection. In our model, protection is decoupled from users' terminals, and it is provided by the access network through a trusted virtual domain. Each trusted virtual domain provides unified and homogeneous security for a single user irrespective of the terminal employed. We describe a user-centric model where nontechnically savvy users can define their own profiles and protection rules in an intuitive way. We show that our model can harness the virtualization power offered by next-generation access networks, especially from network functions virtualization in the points of presence at the edge of telecom operators. We also analyze the distinctive features of our model, and the challenges faced based on the experience gained in the development of a proof of concept.
- Published
- 2015
48. Networking challenges and prospective impact of broadcast-oriented wireless networkson- chip
- Author
-
Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Universitat Politècnica de Catalunya. Departament d'Enginyeria Electrònica, Universitat Politècnica de Catalunya. CBA - Sistemes de Comunicacions i Arquitectures de Banda Ampla, Universitat Politècnica de Catalunya. EPIC - Energy Processing and Integrated Circuits, Abadal Cavallé, Sergi, Nemirovsky, Mario, Alarcón Cot, Eduardo José, Cabellos Aparicio, Alberto, Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Universitat Politècnica de Catalunya. Departament d'Enginyeria Electrònica, Universitat Politècnica de Catalunya. CBA - Sistemes de Comunicacions i Arquitectures de Banda Ampla, Universitat Politècnica de Catalunya. EPIC - Energy Processing and Integrated Circuits, Abadal Cavallé, Sergi, Nemirovsky, Mario, Alarcón Cot, Eduardo José, and Cabellos Aparicio, Alberto
- Abstract
The cost of broadcast has been constraining the design of manycore processors and of the algorithms that run upon them. However, as on-chip RF technologies allow the design of small-footprint and high-bandwidth antennas and transceivers, native low-latency (a few clock cycles) and low-power (a few pJ/bit) broadcast support through wireless communication can be envisaged. In this paper, we analyze the main networking design aspects and challenges of Broadcast-oriented Wireless Network-on-Chip (BoWNoC), which are basically reduced to the development of Medium Access Control (MAC) protocols able to handle hundreds of cores. We evaluate the broadcast performance and scalability of different MAC designs, to then discuss the impact that the proposed paradigm could exert on the performance, scalability and programmability of future manycore architectures, programming models and parallel algorithms., Peer Reviewed, Postprint (published version)
- Published
- 2015
49. NEMsCAM: A novel CAM cell based on nano-electro-mechanical switch and CMOS for energy efficient TLBs
- Author
-
Barcelona Supercomputing Center, Seyedi, Azam, Karakostas, Vasileios, Cosemans, Stefan, Cristal Kestelman, Adrián, Nemirovsky, Mario, Unsal, Osman, Barcelona Supercomputing Center, Seyedi, Azam, Karakostas, Vasileios, Cosemans, Stefan, Cristal Kestelman, Adrián, Nemirovsky, Mario, and Unsal, Osman
- Abstract
In this paper we propose a novel Content Addressable Memory (CAM) cell, NEMsCAM, based on both Nano-electro-mechanical (NEM) switches and CMOS technologies. The memory component of the proposed CAM cell is designed with two complementary non-volatile NEM switches and located on top of the CMOS-based comparison component. As a use case for the NEMsCAM cell, we design first-level data and instruction Translation Lookaside Buffers (TLBs) with 16nm CMOS technology at 2GHz. The simulations show that the NEMsCAM TLB reduces the energy consumption per search operation (by 27%), write operation (by 41.9%) and standby mode (by 53.9%), and the area (by 40.5%) compared to a CMOS-only TLB with minimal performance overhead., We thank all anonymous reviewers for their insightful comments. This work is supported in part by the European Union (FEDER funds) under contract TIN2012-34557, and the European Union’s Seventh Framework Programme (FP7/2007-2013) under the ParaDIME project (GA no. 318693), Postprint (author's final draft)
- Published
- 2015
50. On the Area and Energy Scalability of Wireless Network-on-Chip: A Model-Based Benchmarked Design Space Exploration
- Author
-
Abadal, Sergi, primary, Iannazzo, Mario, additional, Nemirovsky, Mario, additional, Cabellos-Aparicio, Albert, additional, Lee, Heekwan, additional, and Alarcon, Eduard, additional
- Published
- 2015
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.