Author: "Nemirovsky, Mario" / Publication Year Range: Last 10 years - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Nemirovsky, Mario"' showing total 82 results

Start Over Author "Nemirovsky, Mario" Publication Year Range Last 10 years

82 results on '"Nemirovsky, Mario"'

1. Evaluating the Success of Companies at University Science Parks: Key Performance and Innovation Indicators

Author: Olvera, Claudia, Piqué, Josep M., Cortés, Ulises, Nemirovsky, Mario, di Prisco, Marco, Series Editor, Chen, Sheng-Hong, Series Editor, Vayas, Ioannis, Series Editor, Kumar Shukla, Sanjay, Series Editor, Sharma, Anuj, Series Editor, Kumar, Nagesh, Series Editor, Wang, Chien Ming, Series Editor, Abu-Tair, Abid, editor, Lahrech, Abdelmounaim, editor, Al Marri, Khalid, editor, and Abu-Hijleh, Bassam, editor
Published: 2020
Full Text: View/download PDF

2. A Deep Learning Mapper (DLM) for Scheduling on Heterogeneous Systems

Author: Nemirovsky, Daniel, Arkose, Tugberk, Markovic, Nikola, Nemirovsky, Mario, Unsal, Osman, Cristal, Adrian, Valero, Mateo, Barbosa, Simone Diniz Junqueira, Series editor, Chen, Phoebe, Series editor, Filipe, Joaquim, Series editor, Kotenko, Igor, Series editor, Sivalingam, Krishna M., Series editor, Washio, Takashi, Series editor, Yuan, Junsong, Series editor, Zhou, Lizhu, Series editor, Mocskos, Esteban, editor, and Nesmachnow, Sergio, editor
Published: 2018
Full Text: View/download PDF

3. Tackling IoT Ultra Large Scale Systems: Fog Computing in Support of Hierarchical Emergent Behaviors

Author: Roca, Damian, Milito, Rodolfo, Nemirovsky, Mario, Valero, Mateo, Rahmani, Amir M., editor, Liljeberg, Pasi, editor, Preden, Jürgo-Sören, editor, and Jantsch, Axel, editor
Published: 2018
Full Text: View/download PDF

4. Evaluating the Success of Companies at University Science Parks: Key Performance and Innovation Indicators

Author: Olvera, Claudia, primary, Piqué, Josep M., additional, Cortés, Ulises, additional, and Nemirovsky, Mario, additional
Published: 2019
Full Text: View/download PDF

5. Disaggregated Computing. An Evaluation of Current Trends for Datacentres

Author: Meyer, Hugo, Sancho, José Carlos, Quiroga, Josue V., Zyulkyarov, Ferad, Roca, Damián, and Nemirovsky, Mario
Published: 2017
Full Text: View/download PDF

6. A Deep Learning Mapper (DLM) for Scheduling on Heterogeneous Systems

Author: Nemirovsky, Daniel, primary, Arkose, Tugberk, additional, Markovic, Nikola, additional, Nemirovsky, Mario, additional, Unsal, Osman, additional, Cristal, Adrian, additional, and Valero, Mateo, additional
Published: 2017
Full Text: View/download PDF

7. Tackling IoT Ultra Large Scale Systems: Fog Computing in Support of Hierarchical Emergent Behaviors

Author: Roca, Damian, primary, Milito, Rodolfo, additional, Nemirovsky, Mario, additional, and Valero, Mateo, additional
Published: 2017
Full Text: View/download PDF

8. An Energy-Efficient Design Paradigm for a Memory Cell Based on Novel Nanoelectromechanical Switches

Author: Seyedi, Azam, primary, Karakostas, Vasileios, additional, Cosemans, Stefan, additional, Cristal, Adrian, additional, Nemirovsky, Mario, additional, and Unsal, Osman, additional
Published: 2017
Full Text: View/download PDF

9. SABES: statistical available bandwidth estimation from passive TCP measurements

Author: Arcas Abella, Oriol, Montero Banegas, Diego Teodoro, Serral García, René, Ciaccia, Francesco, Romero, Iván, Nemirovsky, Mario, Arcas Abella, Oriol, Montero Banegas, Diego Teodoro, Serral García, René, Ciaccia, Francesco, Romero, Iván, and Nemirovsky, Mario
Abstract: Estimating available network resources is fundamental when adapting the sending rate both at the application and transport layer. Traditional approaches either rely on active probing techniques or iteratively adapting the average sending rate, as is the case for modern TCP congestion control algorithms. In this paper, we propose a statistical method based on the inter-packet arrival time analysis of TCP acknowledgments to estimate a path available bandwidth. SABES first estimates the bottleneck link capacity exploiting the TCP flow slow start traffic patterns. Then, an heuristic based on the capacity estimation, provides an approximation of the end-to-end available bandwidth. Exhaustive experimentation on both simulations and real-world scenarios were conducted to validate our technique, and our results are promising. Furthermore, we train an artificial neural network to improve the estimation accuracy.
Published: 2020

10. SABES: Statistical Available Bandwidth EStimation from passive TCP measurements

Author: Universitat Politècnica de Catalunya. Doctorat en Arquitectura de Computadors, Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Ciaccia, Francesco, Romero Ruiz, Ivan, Arcas Abella, Oriol, Montero Banegas, Diego Teodoro, Serral Gracià, René, Nemirovsky, Mario, Universitat Politècnica de Catalunya. Doctorat en Arquitectura de Computadors, Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Ciaccia, Francesco, Romero Ruiz, Ivan, Arcas Abella, Oriol, Montero Banegas, Diego Teodoro, Serral Gracià, René, and Nemirovsky, Mario
Abstract: Estimating available network resources is fundamental when adapting the sending rate both at the application and transport layer. Traditional approaches either rely on active probing techniques or iteratively adapting the average sending rate, as is the case for modern TCP congestion control algorithms. In this paper, we propose a statistical method based on the inter-packet arrival time analysis of TCP acknowledgments to estimate a path available bandwidth. SABES first estimates the bottleneck link capacity exploiting the TCP flow slow start traffic patterns. Then, an heuristic based on the capacity estimation, provides an approximation of the end-to-end available bandwidth. Exhaustive experimentation on both simulations and real-world scenarios were conducted to validate our technique, and our results are promising. Furthermore, we train an artificial neural network to improve the estimation accuracy., This work was supported by the grant 2015DI023 as part of the Industrial PhD grants of AGAUR and Generalitat de Catalunya. Project co-financed by the Spanish Ministry of Ciencia Innovacion y Universidades with reference RTC-2017-6655-7 (FEDER)., Peer Reviewed, Postprint (author's final draft)
Published: 2020

11. Definition of new WAN paradigms enabled by smart measurements

Author: Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Serral Gracià, René, Nemirovsky, Mario, Ciaccia, Francesco, Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Serral Gracià, René, Nemirovsky, Mario, and Ciaccia, Francesco
Abstract: Nowadays massive amounts of data are being moved over the Internet thanks to data-hungry applications, Big Data, and multimedia content. Combined with a reduction in cost and augmented reliability for high-speed broadband access, the whole Internet infrastructure is facing new challenges especially when information crosses long geographical distances. That is the case for Wide Area Networks (WANs), which are typically traversed in enterprises with multi-site deployments. When a connection is established between end-points that are geographically distant with high latency and high bandwidth, data is flowing over a so-called Long Fat Network. Currently, transport protocols in end-points are not able to exploit the resources of such links, notably the most common TCP implementations still stuffer from design flaws that limit their efficiency. More recent developments still suffer from low fairness in resource sharing and lack of global visibility. We identify SD-WAN as an SDN use-case that can enable new transport protocols adoption, improving traffic behavior over WANs, without the need to modify the end-points. In this thesis, we explore new approaches to network measurements that will enable both end-points and SD-WAN edge routers, to gain visibility over the end-to-end network status. Such additional visibility promotes the development of smarter control mechanisms for network traffic. The preliminary study carried on comprises TCP behavior over WANs and existing methodologies to control its traffic patterns and enforce rate throttling. We also identify a specific use case that poses challenges for WAN scenarios: the Split TCP connections in a Performance Enhancing Proxy (PEP). New control mechanisms to improve resource utilization and fairness are defined in this project. Specifically, we propose a new approach called Receive Window Modulation (RWM) that allows edge-routers to control the sending rate of a TCP connection by modifying the window advertised by the r, Hoy en día, Internet mueve cantidades considerables de datos debido a aplicaciones que requieren muchos datos (Big Data). En combinación con una reducción en los costes y un aumento en la fiabilidad de los enlaces de acceso a banda ancha, la infraestructura de Internet tiene que hacer frente a nuevos retos, especialmente cuando la información tiene que atravesar grandes distancias geográficas. Esto es el caso de las Redes de Area Extendida (WANs), que típicamente forman parte de la infraestructura de empresas con distintas sedes y oficinas. Hoy en día, los protocoles de transporto en los puntos finales no son capaces de explotar los recursos de las WANs, las mas comunes siendo implementaciones de TCP, las cuales todavía sufren de fallos en sus diseños que limitan la eficiencia. Desarrollos TCP recientes todavía no garantizan una repartición equitativa de los recursos de red y faltan de visibilidad global. Identificamos SD-WAN como un caso de uso el cual puede facilitar la adopción de nuevos protocoles de transporte, mejorando el comportamiento del trafico de red sobre WANs, sin la necesidad de modificar los puntos finales. En esta tesis exploramos nuevas propuestas en el campo de las medidas de red, las cuales permiten tanto a puntos finales como a router de borde, de ganar visibilidad sobre el estado de la red. Dicha visibilidad añadida permite el desarrollo de mecanismos de control del trafico de red mas inteligentes. Identificamos un caso de uso especifico que pone retos en los escenarios WAN: las conexiones Split TCP en el caso de un Performance Enhancing Proxy (PEP). En el proyecto vienen definidos nuevos mecanismos que mejoran la explotación y repartición de los recursos de red. En concreto, proponemos un nuevo esquema llamado Receive Window Modulation (RWM), que permite a los routers de borde controlar la ratio de envío de una conexión TCP modificando la ventana de recepción declarada por el recibidor. Probamos que dicho controlador puede mejorar la eficienci, Postprint (published version)
Published: 2020

12. Evaluating University-Business Collaboration at Science Parks: a Business Perspective

Author: Universitat Politècnica de Catalunya. Doctorat en Administració i Direcció d'Empreses, Universitat Politècnica de Catalunya. Departament de Ciències de la Computació, Universitat Politècnica de Catalunya. KEMLG - Grup d'Enginyeria del Coneixement i Aprenentatge Automàtic, Olvera Herrera, Claudia, Piqué, Josep Ma, Cortés García, Claudio Ulises, Nemirovsky, Mario, Universitat Politècnica de Catalunya. Doctorat en Administració i Direcció d'Empreses, Universitat Politècnica de Catalunya. Departament de Ciències de la Computació, Universitat Politècnica de Catalunya. KEMLG - Grup d'Enginyeria del Coneixement i Aprenentatge Automàtic, Olvera Herrera, Claudia, Piqué, Josep Ma, Cortés García, Claudio Ulises, and Nemirovsky, Mario
Abstract: The evaluation of the companies’ performance at University Science Parks (SPs) becomes essential in identifying the needs of the companies and the feasibility of the University-Business Collaboration (ubc). The companies’ real needs are also of interest for universities and SPs, since they face the challenge of designing strategies that best help them to transfer knowledge more effectively. This research article focuses on Key Performance Indicators (kpis) in ubc, needs and business objectives of companies co-located at SPs in Spain and Mexico. This article (i) aims to identify the kpis in ubc used by co-located companies at SPs, and (ii) explore the kpis in ubc and critical success factors of SPs. This article focuses on the perspective of companies, with a secondary focus on the perspectives of SPs and universities. For this study, data was collected through online company surveys in Spain and Mexico., Postprint (published version)
Published: 2020

13. Advances in the Hierarchical Emergent Behaviors (HEB) approach to autonomous vehicles

Author: Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Universitat Politècnica de Catalunya. CAP - Grup de Computació d'Altes Prestacions, Roca, Damian, Milito, Rodolfo, Nemirovsky, Mario, Valero Cortés, Mateo, Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Universitat Politècnica de Catalunya. CAP - Grup de Computació d'Altes Prestacions, Roca, Damian, Milito, Rodolfo, Nemirovsky, Mario, and Valero Cortés, Mateo
Abstract: Widespread deployment of autonomous vehicles (AVs) presents formidable challenges in terms on handling scalability and complexity, particularly regarding vehicular reaction in the face of unforeseen corner cases. Hierarchical Emergent Behaviors (HEB) is a scalable architecture based on the concepts of emergent behaviors and hierarchical decomposition. It relies on a few simple but powerful rules to govern local vehicular interactions. Rather than requiring prescriptive programming of every possible scenario, HEB’s approach relies on global behaviors induced by the application of these local, well-understood rules. Our first two papers on HEB focused on a primal set of rules applied at the first hierarchical level. On the path to systematize a solid design methodology, this paper proposes additional rules for the second level, studies through simulations the resultant richer set of emergent behaviors, and discusses the communica-tion mechanisms between the different levels., Peer Reviewed, Postprint (author's final draft)
Published: 2020

14. HIRE: Hidden Inter-packet Red-shift Effect

Author: Universitat Politècnica de Catalunya. Doctorat en Arquitectura de Computadors, Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Ciaccia, Francesco, Romero Ruiz, Ivan, Serral Gracià, René, Nemirovsky, Mario, Universitat Politècnica de Catalunya. Doctorat en Arquitectura de Computadors, Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Ciaccia, Francesco, Romero Ruiz, Ivan, Serral Gracià, René, and Nemirovsky, Mario
Abstract: Over the years, different techniques have been proposed to detect bottleneck bandwidth and available bandwidth of an end-to-end path. However, to the author's knowledge, no work has been conducted on detecting which link or node on the path could be the narrow link. In this paper, we present a novel technique based on packet pairs dispersion analysis, whose objective is twofold: first, it allows to estimate the narrow link capacity using a new approach which takes into account both inter-packet time and packet propagation delay. Its second objective is to induce the specific hop in the end-to-end path which represents the narrow link. This is achieved by injecting packets trains with intermediate TTL-expiring packets which decrease the train rate when they cross the narrow link (red-shift effect). We validate our approach in simulations showing the tool robustness in very complex scenarios., This work was supported by the Industrial PhD grant 2015DI023 of AGAUR and Gencat and the project Efficient Smart Multi Connected Networks co-financed by the Spanish Ministry of Ciencia Innovacion y Universidades with reference RTC-2017-6655-7, The Spanish Agenda Estatal de Investigacin and the European Regional Development Fund (FEDER)., Peer Reviewed, Postprint (author's final draft)
Published: 2020

15. HIRE: Hidden Inter-packet Red-shift Effect

Author: Ciaccia, Francesco, primary, Romero, Ivan, additional, Serral-Gracia, Rene, additional, and Nemirovsky, Mario, additional
Published: 2020
Full Text: View/download PDF

16. Design Space Exploration of High-Performance Parallel Architectures

Author: Musoll, Enric, primary and Nemirovsky, Mario, additional
Published: 2020
Full Text: View/download PDF

17. Evaluating University-Business Collaboration at Science Parks: a Business Perspective

Author: Olvera, Claudia, primary, Piqué, Josep M., additional, Cortés, Ulises, additional, and Nemirovsky, Mario, additional
Published: 2020
Full Text: View/download PDF

18. Advances in the Hierarchical Emergent Behaviors (HEB) Approach to Autonomous Vehicles

Author: Roca, Damian, primary, Milito, Rodolfo, additional, Nemirovsky, Mario, additional, and Valero, Mateo, additional
Published: 2020
Full Text: View/download PDF

19. Improving TCP performance and reducing self-induced congestion with receive window modulation

Author: Universitat Politècnica de Catalunya. Doctorat en Arquitectura de Computadors, Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Ciaccia, Francesco, Arcas Abella, Oriol, Montero Banegas, Diego Teodoro, Romero Ruiz, Ivan, Milito, Rodolfo, Serral Gracià, René, Nemirovsky, Mario, Universitat Politècnica de Catalunya. Doctorat en Arquitectura de Computadors, Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Ciaccia, Francesco, Arcas Abella, Oriol, Montero Banegas, Diego Teodoro, Romero Ruiz, Ivan, Milito, Rodolfo, Serral Gracià, René, and Nemirovsky, Mario
Abstract: We present a control module for software edge routers called Receive Window Modulation - RWM. Its main objective is to mitigate what we define as self-induced congestion: the result of traffic emission patterns at the source that cause buffering and packet losses in any of the intermediate routers along the path between the connection's endpoints. The controller modifies the receiver's TCP advertised window to match the computed bandwidth-delay product, based on the connection round-trip time estimation and the bandwidth locally available at the edge router. The implemented controller does not need any endpoint modification, allowing it to be deployed in corporate edge routers, increasing visibility and control capabilities. This scheme, when used in real-world experiments with loss-based congestion control algorithms such as CUBIC, is shown to optimize access link utilization and per-connection goodput, and to reduce latency variability and packet losses., This work was supported by the grant 2015DI023 in the framework of the Industrial PhD grants of AGAUR and Generalitat de Catalunya., Peer Reviewed, Postprint (author's final draft)
Published: 2019

20. Improving TCP performance and reducing self-induced congestion with receive window modulation

Author: Montero Banegas, Diego Teodoro, Nemirovsky, Mario Daniel, Serral Graciá, René, Romero, Ivan, Arcas Abella, Oriol, Milito, Rodolfo, Ciaccia, Francesco, Montero Banegas, Diego Teodoro, Nemirovsky, Mario Daniel, Serral Graciá, René, Romero, Ivan, Arcas Abella, Oriol, Milito, Rodolfo, and Ciaccia, Francesco
Abstract: We present a control module for software edge routers called Receive Window Modulation - RWM. Its main objective is to mitigate what we define as self-induced congestion: the result of traffic emission patterns at the source that cause buffering and packet losses in any of the intermediate routers along the path between the connection's endpoints. The controller modifies the receiver's TCP advertised window to match the computed bandwidth-delay product, based on the connection round-trip time estimation and the bandwidth locally available at the edge router. The implemented controller does not need any endpoint modification, allowing it to be deployed in corporate edge routers, increasing visibility and control capabilities. This scheme, when used in real-world experiments with loss-based congestion control algorithms such as CUBIC, is shown to optimize access link utilization and per-connection goodput, and to reduce latency variability and packet losses.
Published: 2019

21. The effectiveness of knowledge and technology transfer through university-business collaboration in science parks

Author: Universitat Politècnica de Catalunya. Departament d'Organització d'Empreses, Cortés García, Claudio Ulises, Nemirovsky, Mario, Olvera, Caludia, Universitat Politècnica de Catalunya. Departament d'Organització d'Empreses, Cortés García, Claudio Ulises, Nemirovsky, Mario, and Olvera, Caludia
Abstract: Science and Technology Parks (STPs) facilitate the flow of knowledge and technology among universities; R&D institutions; companies and markets, and foster the creation and growth of innovation-based companies. Among the diversities of STPs, it is possible to identify two types: (i) Science Parks (SPs), which involve university shareholding and (ii) Technology Parks (TPs), which are not owned by universities. This study will take into account only SPs since they are closely linked to the university, and they are the bridge between a University and companies in the process of Knowledge and Technology Transfer (KTT). The evaluation of the firms' performance in Science Parks results determinant to identify the needs of the companies and the feasibility of the University-Business Collaboration (UBC). The firms' real needs also are of interest for Universities and Science parks, since they face the challenge of designing strategies that best help them to transfer the knowledge more effectively. While previous studies have been focused on tenants´ innovation performance on-Park and off-Park, very little research has taken into account the Parks heterogeneity that may affect the firms' performance. This research paper focuses on SPs in Spain and Mexico due to data availability. This paper (i) aims to identify the Key Performance Indicators (KPIs) in UBC used by co-located companies at SPs, and (ii) explore the performance measure (KPIs) in UBC and critical success factors of SPs. For this study, data was collected through fifty eight online company surveys in Spain and forty two in México. This empirical analysis uses fourteen semi-structured interviews, addressed to SPs directors in order to explore (KPIs) and success factors of SPs in both countries, Los Parques Científicos y Tecnológicos (PTS) facilitan el flujo de conocimiento y tecnología entre las universidades; Centros de Investigación; empresas y mercados, y fomentan la Creación y crecimiento de empresas basadas en la innovación. Entre las diversidades de STP, es posible identificar dos tipos: (i) Parques científicos (SP), el los cuales la universidad, tiene una participación accionaria y (ii) Parques Tecnológicos (TP), en los cuales las universidades tienen una participación mínima de acciones. Este estudio tomará en cuenta solo los SP ya que están estrechamente vinculados a la universidad, y son el puente entre una universidad y empresas en proceso de transferencia de conocimiento y tecnología. (KTT). La evaluación del desempeño de las empresas en los parques científicos es determinante para identificar las necesidades de las empresas y la viabilidad de la Colaboración Universidad-Empresa. (UBC). Las necesidades reales de las empresas también son de interés para Universidades, ya a que enfrentan el desafío de diseñar estrategias que les ayuden a transferir el conocimiento de una forma más eficaz. Mientras que estudios anteriores se han centrado en medir la innovacion de las empresas (on-park y off-park), muy poca investigación ha tenido en cuenta la heterogeneidad de los SP, que puede afectar el desempeño de las empresas. Este trabajo de investigación se centra en los SP en España y México por disponibilidad de datos. Este estudio (i) tiene como objetivo identificar los Key Perfofromance Indicators (KPI) en UBC utilizados por las empresas establecidas en los SP, y (ii) explorar las métricas (KPI) en UBC y factores críticos de éxito de los SP. Para este estudio,se enviaron encuestas en linea a nueve SP de México y España y se obtuvieron cincuenta y ocho encuestas de empresas en España y cuarenta y dos en México. Este análisis también utiliza investigación qualitativa, ( 14 entrevistas semi-estructuradas, dirigidas a directores de SP), para explorar (KPI), Postprint (published version)
Published: 2019

22. Evaluating University-Business Collaboration at Science Parks: a Business Perspective.

Author: Olvera, Claudia, Piqué, Josep M., Cortés, Ulises, and Nemirovsky, Mario
Subjects: BUSINESS parks, KNOWLEDGE transfer, RESEARCH parks, COOPERATIVE research, CRITICAL success factor, KEY performance indicators (Management)
Abstract: Copyright of Triple Helix is the property of Brill Academic Publishers and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
Published: 2021
Full Text: View/download PDF

23. Evaluation of a Rack-Scale Disaggregated Memory Prototype for Cloud Data Centers

Author: Quiroga, Josue V., primary, Torrents, Marti, additional, Sonmez, Nehir, additional, Theodoropoulos, Dimitris, additional, Zyulkyarov, Ferad, additional, and Nemirovsky, Mario, additional
Published: 2019
Full Text: View/download PDF

24. Improving TCP Performance and Reducing Self-Induced Congestion with Receive Window Modulation

Author: Ciaccia, Francesco, primary, Arcas-Abella, Oriol, additional, Montero, Diego, additional, Romero, Ivan, additional, Milito, Rodolfo, additional, Serral-Gracia, Rene, additional, and Nemirovsky, Mario, additional
Published: 2019
Full Text: View/download PDF

25. Tackling IoT ultra large scale systems: Fog computing in support of hierarchical emergent behaviors

Author: Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Barcelona Supercomputing Center, Universitat Politècnica de Catalunya. CAP - Grup de Computació d'Altes Prestacions, Roca, Damian, Milito, Rodolfo, Nemirovsky, Mario, Valero Cortés, Mateo, Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Barcelona Supercomputing Center, Universitat Politècnica de Catalunya. CAP - Grup de Computació d'Altes Prestacions, Roca, Damian, Milito, Rodolfo, Nemirovsky, Mario, and Valero Cortés, Mateo
Abstract: The Internet of Things (IoT) marks a phase transition in the evolution of the Internet, distinguished by a massive connectivity and the interaction with the physical world. The organic evolution of IoT requires the consideration of three dimensions: scale, organization, and context. These dimensions are particularly relevant in Ultra Large Scale Systems (ULSS), of which autonomous vehicles is a prime example. Fog Computing is well positioned to support contextual awareness and communication, critical for ULSS. The design and orchestration of ULSS require fresh approaches, new organizing principles. A recent paper proposed Hierarchical Emergent Behaviors (HEB), an architecture that builds on established concepts of emergent behaviors and hierarchical decomposition and organization. HEB’s local rules induce emergent behaviors, i.e., useful behaviors not explicitly programmed. In this chapter we take a first step to validate HEB concepts through the study of two basic self-driven car “primitives”: exiting a platoon formation, and maneuvering in anticipation of obstacles beyond the range of on-board sensors. Fog nodes provide the critical contextual information required., Damian Roca work was supported by a Doctoral Scholarship provided by Fundación La Caixa. This work has been supported by the Spanish Government (Severo Ochoa grants SEV2015-0493) and by the Spanish Ministry of Science and Innovation (contracts TIN2015-65316-P)., Peer Reviewed, Postprint (author's final draft)
Published: 2018

26. A general guide to applying machine learning to computer architecture

Author: Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Barcelona Supercomputing Center, Universitat Politècnica de Catalunya. CAP - Grup de Computació d'Altes Prestacions, Nemirovsky, Daniel, Arkose, Tugberk, Markovic, Nikola, Nemirovsky, Mario, Unsal, Osman Sabri, Cristal Kestelman, Adrián, Valero Cortés, Mateo, Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Barcelona Supercomputing Center, Universitat Politècnica de Catalunya. CAP - Grup de Computació d'Altes Prestacions, Nemirovsky, Daniel, Arkose, Tugberk, Markovic, Nikola, Nemirovsky, Mario, Unsal, Osman Sabri, Cristal Kestelman, Adrián, and Valero Cortés, Mateo
Abstract: The resurgence of machine learning since the late 1990s has been enabled by significant advances in computing performance and the growth of big data. The ability of these algorithms to detect complex patterns in data which are extremely difficult to achieve manually, helps to produce effective predictive models. Whilst computer architects have been accelerating the performance of machine learning algorithms with GPUs and custom hardware, there have been few implementations leveraging these algorithms to improve the computer system performance. The work that has been conducted, however, has produced considerably promising results. The purpose of this paper is to serve as a foundational base and guide to future computer architecture research seeking to make use of machine learning models for improving system efficiency. We describe a method that highlights when, why, and how to utilize machine learning models for improving system performance and provide a relevant example showcasing the effectiveness of applying machine learning in computer architecture. We describe a process of data generation every execution quantum and parameter engineering. This is followed by a survey of a set of popular machine learning models. We discuss their strengths and weaknesses and provide an evaluation of implementations for the purpose of creating a workload performance predictor for different core types in an x86 processor. The predictions can then be exploited by a scheduler for heterogeneous processors to improve the system throughput. The algorithms of focus are stochastic gradient descent based linear regression, decision trees, random forests, artificial neural networks, and k-nearest neighbors., This work has been supported by the European Research Council (ERC) Advanced Grant RoMoL (Grant Agreemnt 321253) and by the Spanish Ministry of Science and Innovation (contract TIN 2015-65316P)., Peer Reviewed, Postprint (published version)
Published: 2018

27. Analysis and simulation of emergent architectures for internet of things

Author: Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Nemirovsky, Mario, Valero Cortés, Mateo, Roca Marí, Damián, Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Nemirovsky, Mario, Valero Cortés, Mateo, and Roca Marí, Damián
Abstract: The Internet of Things (IoT) promises a plethora of new services and applications supported by a wide range of devices that includes sensors and actuators. To reach its potential IoT must break down the silos that limit applications' interoperability and hinder their manageability. These silos' result from existing deployment techniques where each vendor set up its own infrastructure, duplicating the hardware and increasing the costs. Fog Computing can serve as the underlying platform to support IoT applications thus avoiding the silos'. Each application becomes a system formed by IoT devices (i.e. sensors, actuators), an edge infrastructure (i.e. Fog Computing) and the Cloud. In order to improve several aspects of human lives, different systems can interact to correlate data obtaining functionalities not achievable by any of the systems in isolation. Then, we can analyze the IoT as a whole system rather than a conjunction of isolated systems. Doing so leads to the building of Ultra-Large Scale Systems (ULSS), an extension of the concept of Systems of Systems (SoS), in several verticals including Autonomous Vehicles, Smart Cities, and Smart Grids. The scope of ULSS is large in the number of things and complex in the variety of applications, volume of data, and diversity of communication patterns. To handle this scale and complexity in this thesis we propose Hierarchical Emergent Behaviors (HEB), a paradigm that builds on the concepts of emergent behavior and hierarchical organization. Rather than explicitly program all possible situations in the vast space of ULSS scenarios, HEB relies on emergent behaviors induced by local rules that define the interactions of the "things" between themselves and also with their environment. We discuss the modifications to classical IoT architectures required by HEB, as well as the new challenges. Once these challenges such as scalability and manageability are addressed, we can illustrate HEB's usefulness dealing with an IoT-based U, El Internet de las Cosas (IoT) promete una plétora de nuevos servicios y aplicaciones habilitadas por una amplia gama de dispositivos que incluye sensores y actuadores. Para alcanzar su potencial, IoT debe superar los silos que limitan la interoperabilidad de las aplicaciones y dificultan su administración. Estos silos son el resultado de las técnicas de implementación existentes en las que cada proveedor instala su propia infraestructura y duplica el hardware, incrementando los costes. Fog Computing puede servir como la plataforma subyacente que soporte aplicaciones del IoT evitando así los silos. Cada aplicación se convierte en un sistema formado por dispositivos IoT (por ejemplo sensores y actuadores), una infraestructura (como Fog Computing) y la nube. Con el fin de mejorar varios aspectos de la vida humana, diferentes sistemas pueden interactuar para correlacionar datos obteniendo funcionalidades que no pueden lograrse por ninguno de los sistemas de forma aislada. Entonces, podemos analizar el IoT como un único sistema en lugar de una conjunción de sistemas aislados. Esta perspectiva conduce a la construcción de Ultra-Large Scale Systems (ULSS), una extensión del concepto de Systems of Systems (SoS), en varios verticales, incluidos los vehículos autónomos, Smart Cities y Smart Grids. El alcance de ULSS es vasto debido a la cantidad de dispositivos y complejo en la variedad de aplicaciones, volumen de datos y diversidad de patrones de comunicación. Para manejar esta escala y complejidad, en esta tesis proponemos Hierarchical Emergent Behaviors (HEB), un paradigma que se basa en los conceptos de comportamientos emergente y organización jerárquica. En lugar de programar explícitamente todas las situaciones posibles en el vasto espacio de escenarios presentes en los ULSS, HEB se basa en comportamientos emergentes inducidos por reglas locales que definen las interacciones de las "cosas" entre ellas y también con su entorno. Discutimos las modificaciones a las arquit, Postprint (published version)
Published: 2018

28. iQ: an efficient and flexible queue-based simulation framework

Author: Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Barcelona Supercomputing Center, Universitat Politècnica de Catalunya. CAP - Grup de Computació d'Altes Prestacions, Roca, Damian, Nemirovsky, Daniel, Casas, Marc, Moreto Planas, Miquel, Valero Cortés, Mateo, Nemirovsky, Mario, Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Barcelona Supercomputing Center, Universitat Politècnica de Catalunya. CAP - Grup de Computació d'Altes Prestacions, Roca, Damian, Nemirovsky, Daniel, Casas, Marc, Moreto Planas, Miquel, Valero Cortés, Mateo, and Nemirovsky, Mario
Abstract: Conventional system simulators are readily used by computer architects to design and evaluate their processor designs. These simulators provide reasonable levels of accuracy and execution detail but suffer from long simulation latencies and increased implementation complexity. In this work we propose iQ, a queue-based modeling technique that targets design space exploration and optimization studies at the core component level. iQ emulates processor elements by abstracting the implementation details into modular components composed of queue structures, delay parameters, probabilistic driven message generation and event control. Its easy reconfigurability makes iQ a highly flexible and powerful processor simulator. We have used iQ to build an Ivy Bridge and a Core 2 Duo processor model and have validated them against real hardware running SPEC CPU2006 Int achieving average error rates of 9.55% and 8.93%., The authors would like to thank Mauricio Breternitz, Rodolfo Milito, and Vasilis Karakostas for their helpful reviews. Damian Roca work was supported by a Doctoral Scholarship provided by Fundación La Caixa. This work has been supported by the Spanish Government (Severo Ochoa grants SEV2015-0493) and by the Spanish Ministry of Science and Innovation (contracts TIN2015-65316-P)., Peer Reviewed, Postprint (author's final draft)
Published: 2017

29. Fog function virtualization: A flexible solution for IoT applications

Author: Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Barcelona Supercomputing Center, Universitat Politècnica de Catalunya. CAP - Grup de Computació d'Altes Prestacions, Roca, Damian, Quiroga, Josue V., Valero Cortés, Mateo, Nemirovsky, Mario, Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Barcelona Supercomputing Center, Universitat Politècnica de Catalunya. CAP - Grup de Computació d'Altes Prestacions, Roca, Damian, Quiroga, Josue V., Valero Cortés, Mateo, and Nemirovsky, Mario
Abstract: The Internet of Things applications must carefully assess certain crucial factors such as the real-time and largely distributed nature of the “things”. Fog Computing provides an architecture to satisfy those requirements through nodes located from near the “things” till the edge. The problem comes with the integration of the Fog nodes into current infrastructures. This process requires the development of complex software solutions and prevents Fog growth. In this paper we propose three innovations to enhance Fog: (i) a new orchestration policy, (ii) the creation of constellations of nodes, and (iii) Fog Function Virtualization (FFV). All together will complement Fog to reach its true potential as a generic scalable platform, running multiple IoT applications simultaneously. Deploying a new service is reduced to the development of the application code, fact that brings the democratization of the Fog Computing paradigm through ease of deployment and cost reduction., The authors thanks Rodolfo Milito for his insightful comments and revisions. Damian Roca work was supported by a Doctoral Scholarship provided by Fundación La Caixa. Josue V. Quiroga work was supported by a Doctoral Scholarship provided by the Mexican National Council of Science and Technology (CONACyT). This work has been supported by the Spanish Government (Severo Ochoa grants SEV2015-0493) and by the Spanish Ministry of Science and Innovation (contracts TIN2015-65316-P)., Peer Reviewed, Postprint (author's final draft)
Published: 2017

30. Disaggregated Computing. An Evaluation of Current Trends for Datacentres

Author: Barcelona Supercomputing Center, Meyer, Hugo, Sancho, Jose C., Quiroga, Josue V., Zyulkyarov, Ferad, Roca, Damian, Nemirovsky, Mario, Barcelona Supercomputing Center, Meyer, Hugo, Sancho, Jose C., Quiroga, Josue V., Zyulkyarov, Ferad, Roca, Damian, and Nemirovsky, Mario
Abstract: Next generation data centers will likely be based on the emerging paradigm of disaggregated function-blocks-as-a-unit departing from the current state of mainboard-as-a-unit. Multiple functional blocks or bricks such as compute, memory and peripheral will be spread through the entire system and interconnected together via one or multiple high speed networks. The amount of memory available will be very large distributed among multiple bricks. This new architecture brings various benefits that are desirable in today’s data centers such as fine-grained technology upgrade cycles, fine-grained resource allocation, and access to a larger amount of memory and accelerators. An analysis of the impact and benefits of memory disaggregation is presented in this paper. One of the biggest challenges when analyzing these architectures is that memory accesses should be modeled correctly in order to obtain accurate results. However, modeling every memory access would generate a high overhead that can make the simulation unfeasible for real data center applications. A model to represent and analyze memory disaggregation has been designed and a statistics-based queuing-based full system simulator was developed to rapidly and accurately analyze applications performance in disaggregated systems. With a mean error of 10%, simulation results pointed out that the network layers may introduce overheads that degrade applications’ performance up to 66%. Initial results also suggest that low memory access bandwidth may degrade up to 20% applications’ performance., This project has received funding from the European Unions Horizon 2020 research and innovation programme under grant agreement No 687632 (dReDBox project) and TIN2015-65316-P - Computacion de Altas Prestaciones VII., Peer Reviewed, Postprint (published version)
Published: 2017

31. A Machine Learning Approach for Performance Prediction and Scheduling on Heterogeneous CPUs

Author: Nemirovsky, Daniel, primary, Arkose, Tugberk, additional, Markovic, Nikola, additional, Nemirovsky, Mario, additional, Unsal, Osman, additional, and Cristal, Adrian, additional
Published: 2017
Full Text: View/download PDF

32. iQ: An Efficient and Flexible Queue-Based Simulation Framework

Author: Roca, Damian, primary, Nemirovsky, Daniel, additional, Casas, Marc, additional, Moreto, Miquel, additional, Valero, Mateo, additional, and Nemirovsky, Mario, additional
Published: 2017
Full Text: View/download PDF

33. Fog Function Virtualization: A flexible solution for IoT applications

Author: Roca, Damian, primary, Quiroga, Josue V., additional, Valero, Mateo, additional, and Nemirovsky, Mario, additional
Published: 2017
Full Text: View/download PDF

34. Scalability of Broadcast Performance in Wireless Network-on-Chip

Author: Abadal, Sergi, primary, Mestres, Albert, additional, Nemirovsky, Mario, additional, Lee, Heekwan, additional, Gonzalez, Antonio, additional, Alarcon, Eduard, additional, and Cabellos-Aparicio, Albert, additional
Published: 2016
Full Text: View/download PDF

35. Emergent Behaviors in the Internet of Things: The Ultimate Ultra-Large-Scale System

Author: Roca, Damian, primary, Nemirovsky, Daniel, additional, Nemirovsky, Mario, additional, Milito, Rodolfo, additional, and Valero, Mateo, additional
Published: 2016
Full Text: View/download PDF

36. HFOG: Small versus Big Data

Author: Ferrer-Roca, Olga, Roca, Damian, Nemirovsky, Mario, and Milito, Rodolfo
Published: 2015
Full Text: View/download PDF

37. Scalability of broadcast performance in wireless network-on-chip

Author: Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Universitat Politècnica de Catalunya. Departament d'Enginyeria Electrònica, Universitat Politècnica de Catalunya. CBA - Sistemes de Comunicacions i Arquitectures de Banda Ampla, Universitat Politècnica de Catalunya. ARCO - Microarquitectura i Compiladors, Universitat Politècnica de Catalunya. EPIC - Energy Processing and Integrated Circuits, Abadal Cavallé, Sergi, Mestres Sugrañes, Albert, Nemirovsky, Mario, Lee, Heekwan, González Colás, Antonio María, Alarcón Cot, Eduardo José, Cabellos Aparicio, Alberto, Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Universitat Politècnica de Catalunya. Departament d'Enginyeria Electrònica, Universitat Politècnica de Catalunya. CBA - Sistemes de Comunicacions i Arquitectures de Banda Ampla, Universitat Politècnica de Catalunya. ARCO - Microarquitectura i Compiladors, Universitat Politècnica de Catalunya. EPIC - Energy Processing and Integrated Circuits, Abadal Cavallé, Sergi, Mestres Sugrañes, Albert, Nemirovsky, Mario, Lee, Heekwan, González Colás, Antonio María, Alarcón Cot, Eduardo José, and Cabellos Aparicio, Alberto
Abstract: Networks-on-Chip (NoCs) are currently the paradigm of choice to interconnect the cores of a chip multiprocessor. However, conventional NoCs may not suffice to fulfill the on-chip communication requirements of processors with hundreds or thousands of cores. The main reason is that the performance of such networks drops as the number of cores grows, especially in the presence of multicast and broadcast traffic. This not only limits the scalability of current multiprocessor architectures, but also sets a performance wall that prevents the development of architectures that generate moderate-to-high levels of multicast. In this paper, a Wireless Network-on-Chip (WNoC) where all cores share a single broadband channel is presented. Such design is conceived to provide low latency and ordered delivery for multicast/broadcast traffic, in an attempt to complement a wireline NoC that will transport the rest of communication flows. To assess the feasibility of this approach, the network performance of WNoC is analyzed as a function of the system size and the channel capacity, and then compared to that of wireline NoCs with embedded multicast support. Based on this evaluation, preliminary results on the potential performance of the proposed hybrid scheme are provided, together with guidelines for the design of MAC protocols for WNoC., Peer Reviewed, Postprint (published version)
Published: 2016

38. Thread assignment in multicore/multithreaded processors: A statistical approach

Author: Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Barcelona Supercomputing Center, Universitat Politècnica de Catalunya. CAP - Grup de Computació d'Altes Prestacions, Radojković, Petar, Carpenter, Paul Matthew, Moretó Planas, Miquel, Cakarevic, Vladimir, Verdú Mulà, Javier, Pajuelo González, Manuel Alejandro, Cazorla Almeida, Francisco Javier, Nemirovsky, Mario, Valero Cortés, Mateo, Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Barcelona Supercomputing Center, Universitat Politècnica de Catalunya. CAP - Grup de Computació d'Altes Prestacions, Radojković, Petar, Carpenter, Paul Matthew, Moretó Planas, Miquel, Cakarevic, Vladimir, Verdú Mulà, Javier, Pajuelo González, Manuel Alejandro, Cazorla Almeida, Francisco Javier, Nemirovsky, Mario, and Valero Cortés, Mateo
Abstract: © 2015 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works., The introduction of multicore/multithreaded processors, comprised of a large number of hardware contexts (virtual CPUs) that share resources at multiple levels, has made process scheduling, in particular assignment of running threads to available hardware contexts, an important aspect of system performance. Nevertheless, thread assignment of applications running on state-of-the art processors is an NP-complete problem. Over the years, numerous studies have proposed heuristic-based algorithms for thread assignment. Since the thread assignment problem is intractable, it is in general impossible to know the performance of the optimal assignment, so the room for improvement of a given algorithm is also unknown. It is therefore hard to decide whether to invest more effort and time to improve an algorithm that may already be close to optimal. In this paper, we present a statistical approach to the thread assignment problem. First, we present a method that predicts the performance of the optimal thread assignment, based on the observed performance of each thread assignment in a random sample. The method is based on Extreme Value Theory (EVT), a branch of statistics that analyses extreme deviations from the population mean. We also propose sample pruning, a method that significantly reduces the time required to apply the statistical method by reducing the number of candidate solutions that need to be measured. Finally, we show that, if no suitable heuristic-based algorithm is available, a sample of several thousand random thread assignments is enough to obtain, with high confidence, an assignment with performance close to optimal. The presented approach is architecture and application independent, and it can be used to address the thread assignment problem in various domains. It is especially well suited for systems in which the workload seldom changes. An example is network systems, which typically provide a constant set of services that are known in advance, with network, This work has been supported by the Spanish Ministry of Science and Innovation under grant TIN2012-34557, the HiPEAC Network of Excellence, and by the European Research Council under the European Union’s 7th FP, ERC Grant Agreement number 321253. Miquel Moreto has been partially supported by the Ministry of Economy and Competitiveness under Juan de la Cierva postdoctoral fellowship number JCI-2012-15047., Peer Reviewed, Postprint (author's final draft)
Published: 2016

39. Thread Assignment in Multicore/Multithreaded Processors: A Statistical Approach

Author: Radojkovic, Petar, Carpenter, Paul M., Moreto, Miquel, Cakarevic, Vladimir, Verdu, Javier, Pajuelo, Alex, Cazorla, Francisco J., Nemirovsky, Mario, Valero, Mateo, Radojkovic, Petar, Carpenter, Paul M., Moreto, Miquel, Cakarevic, Vladimir, Verdu, Javier, Pajuelo, Alex, Cazorla, Francisco J., Nemirovsky, Mario, and Valero, Mateo
Abstract: The introduction of multicore/multithreaded processors, comprised of a large number of hardware contexts (virtual CPUs) that share resources at multiple levels, has made process scheduling, in particular assignment of running threads to available hardware contexts, an important aspect of system performance. Nevertheless, thread assignment of applications running on state-of-the art processors is an NP-complete problem. Over the years, numerous studies have proposed heuristic-based algorithms for thread assignment. Since the thread assignment problem is intractable, it is in general impossible to know the performance of the optimal assignment, so the room for improvement of a given algorithm is also unknown. It is therefore hard to decide whether to invest more effort and time to improve an algorithm that may already be close to optimal. In this paper, we present a statistical approach to the thread assignment problem. First, we present a method that predicts the performance of the optimal thread assignment, based on the observed performance of each thread assignment in a random sample. The method is based on Extreme Value Theory (EVT), a branch of statistics that analyses extreme deviations from the population mean. We also propose sample pruning, a method that significantly reduces the time required to apply the statistical method by reducing the number of candidate solutions that need to be measured. Finally, we show that, if no suitable heuristic-based algorithm is available, a sample of several thousand random thread assignments is enough to obtain, with high confidence, an assignment with performance close to optimal. The presented approach is architecture and application independent, and it can be used to address the thread assignment problem in various domains. It is especially well suited for systems in which the workload seldom changes. An example is network systems, which typically provide a constant set of services that are known in advance, with network
Published: 2016

40. Emergent behaviors in the Internet of things: The ultimate ultra-large-scale system

Author: Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Barcelona Supercomputing Center, Universitat Politècnica de Catalunya. CAP - Grup de Computació d'Altes Prestacions, Roca, Damian, Nemirovsky, Daniel, Nemirovsky, Mario, Milito, Rodolfo, Valero Cortés, Mateo, Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Barcelona Supercomputing Center, Universitat Politècnica de Catalunya. CAP - Grup de Computació d'Altes Prestacions, Roca, Damian, Nemirovsky, Daniel, Nemirovsky, Mario, Milito, Rodolfo, and Valero Cortés, Mateo
Abstract: To reach its potential, the Internet of Things (IoT) must break down the silos that limit applications' interoperability and hinder their manageability. Doing so leads to the building of ultra-large-scale systems (ULSS) in several areas, including autonomous vehicles, smart cities, and smart grids. The scope of ULSS is both large and complex. Thus, the authors propose Hierarchical Emergent Behaviors (HEB), a paradigm that builds on the concepts of emergent behavior and hierarchical organization. Rather than explicitly programming all possible decisions in the vast space of ULSS scenarios, HEB relies on the emergent behaviors induced by local rules at each level of the hierarchy. The authors discuss the modifications to classical IoT architectures required by HEB, as well as the new challenges. They also illustrate the HEB concepts in reference to autonomous vehicles. This use case paves the way to the discussion of new lines of research., Damian Roca work was supported by a Doctoral Scholarship provided by Fundación La Caixa. This work has been supported by the Spanish Government (Severo Ochoa grants SEV2015-0493) and by the Spanish Ministry of Science and Innovation (contracts TIN2015-65316-P)., Peer Reviewed, Postprint (author's final draft)
Published: 2016

41. Scalability of Broadcast Performance in Wireless Network-on-Chip

Author: Barcelona Supercomputing Center, Abadal Cavallé, Sergi, Mestres Sugrañes, Albert, Nemirovsky, Mario, Lee, Heekwan, Gonzalez, Antonio, Alarcón Cot, Eduardo José, Cabellos Aparicio, Alberto, Barcelona Supercomputing Center, Abadal Cavallé, Sergi, Mestres Sugrañes, Albert, Nemirovsky, Mario, Lee, Heekwan, Gonzalez, Antonio, Alarcón Cot, Eduardo José, and Cabellos Aparicio, Alberto
Abstract: Networks-on-Chip (NoCs) are currently the paradigm of choice to interconnect the cores of a chip multiprocessor. However, conventional NoCs may not suffice to fulfill the on-chip communication requirements of processors with hundreds or thousands of cores. The main reason is that the performance of such networks drops as the number of cores grows, especially in the presence of multicast and broadcast traffic. This not only limits the scalability of current multiprocessor architectures, but also sets a performance wall that prevents the development of architectures that generate moderate-to-high levels of multicast. In this paper, a Wireless Network-on-Chip (WNoC) where all cores share a single broadband channel is presented. Such design is conceived to provide low latency and ordered delivery for multicast/broadcast traffic, in an attempt to complement a wireline NoC that will transport the rest of communication flows. To assess the feasibility of this approach, the network performance of WNoC is analyzed as a function of the system size and the channel capacity, and then compared to that of wireline NoCs with embedded multicast support. Based on this evaluation, preliminary results on the potential performance of the proposed hybrid scheme are provided, together with guidelines for the design of MAC protocols for WNoC., The authors gratefully acknowledge support from INTEL’s Doctoral Student Honor Program, as well as from Samsung’s Global Research Outreach (GRO) program. This work has been also partially supported by the Catalan Government through a FI-AGAUR grant and by the Spanish State Ministry of Economy and Competitiveness under grant aid PCIN-2015-012., Peer Reviewed, Postprint (author's final draft)
Published: 2016

42. Improving the performance and energy-efficiency of virtual memory

Author: Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Nemirovsky, Mario, Ünsal, Osman, Cristal Kestelman, Adrián, Karakostas, Vasileios, Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Nemirovsky, Mario, Ünsal, Osman, Cristal Kestelman, Adrián, and Karakostas, Vasileios
Abstract: Virtual memory improves programmer productivity, enhances process security, and increases memory utilization. However, virtual memory requires an address translation from the virtual to the physical address space on every memory operation. Page-based implementations of virtual memory divide physical memory into fixed size pages, and use a per-process page table to map virtual pages to physical pages. The hardware key component for accelerating address translation is the Translation Lookaside Buffer (TLB), that holds recently used mappings from the virtual to the physical address space. However, address translation still incurs high (i) performance overheads due to costly page table walks after TLB misses, and (ii) energy overheads due to frequent TLB lookups on every memory operation. This thesis quantifies these overheads and proposes techniques to mitigate them. In this thesis we argue that fixed size page-based approaches for address translation exhibit limited potential for improving TLB performance because they increase the TLB reach by a fixed amount. To overcome the limitations of such approaches, we introduce the concept of range translations and we show how they can significantly improve the performance and energy-efficiency of address translation. We first comprehensively quantify the address translation performance overhead on a collection of emerging scale-out applications. We show that address translation accounts for up to 16% of the total execution time. We find that huge pages may improve the application performance by reducing the time spent in page walks, enabling better exploitation of the available execution resources. However, the limited hardware support for huge pages in combination with the workloads' low memory locality leave ample space for performance optimizations. To reduce the performance overheads of address translation, we propose Redundant Memory Mappings (RMM). RMM provides an efficient alternative representation of many virtual-to, La memoria virtual aumenta la productividad del programador, provee seguridad a los procesos e incrementa la utilización de la memoria. No obstante, la memoria virtual requiere de una traducción de direcciones entre los espacios de direcciones virtual y físico en cada operación de memoria. La implementación de la memoria virtual paginada divide la memoria física en páginas de tamaño fijo. El principal componente para acelerar la traducción de direcciones es la TLB (Translation Lookaside Buffer). Sin embargo, la traducción de direcciones tiene un alto coste en el rendimiento, por la necesidad de buscar en la tabla de páginas después de un fallo de TLB, y por el coste energético por las frecuentes búsquedas en la TLB (una por cada operación de memoria). En esta tesis defendemos que los mecanismos de traducción basados en páginas tienen un potencial limitado para aumentar el rendimiento de la TLB. Principalmente porque solo se puede aumentar en una cantidad limitada el conjunto de direcciones que la TLB puede traducir. Para superar esta limitaciones, introducimos el concepto de traducciones por rangos y mostramos como este mecanismo puede mejorar significativamente el rendimiento y la eficiencia energética en la traducción de direcciones. Primero, cuantificamos la pérdida de rendimiento debido a la traducción en aplicaciones emergentes que escalan bien al agregar más procesadores. Mostramos que en estas aplicaciones la traducción de direcciones es responsable de hasta el 16% del tiempo de ejecución. Además, también mostramos que las páginas grandes pueden mejorar el rendimiento de las aplicaciones, permitiendo un mejor uso de los recursos disponibles. Sin embargo, el limitado soporte del hardware para páginas grandes, combinado con cargas de trabajo con poca localidad, nos deja mucho espacio para la optimización. Para reducir los costes de rendimiento de la traducción de direcciones, proponemos RMM (Redundant Memory Mappings). RMM esta basado en rangos de páginas y ofr, Postprint (published version)
Published: 2016

43. Range Translations for Fast Virtual Memory

Author: Gandhi, Jayneel, primary, Karakostas, Vasileios, additional, Ayar, Furkan, additional, Cristal, Adrian, additional, Hill, Mark D., additional, McKinley, Kathryn S., additional, Nemirovsky, Mario, additional, Swift, Michael M., additional, and Unsal, Osman S., additional
Published: 2016
Full Text: View/download PDF

44. Energy-efficient address translation

Author: Karakostas, Vasileios, primary, Gandhi, Jayneel, additional, Cristal, Adrian, additional, Hill, Mark D., additional, McKinley, Kathryn S., additional, Nemirovsky, Mario, additional, Swift, Michael M., additional, and Unsal, Osman S., additional
Published: 2016
Full Text: View/download PDF

45. Thread Assignment in Multicore/Multithreaded Processors: A Statistical Approach

Author: Radojkovic, Petar, primary, Carpenter, Paul M., additional, Moreto, Miquel, additional, Cakarevic, Vladimir, additional, Verdu, Javier, additional, Pajuelo, Alex, additional, Cazorla, Francisco J., additional, Nemirovsky, Mario, additional, and Valero, Mateo, additional
Published: 2016
Full Text: View/download PDF

46. Broadcast-enabled massive multicore architectures: a wireless RF approach

Author: Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Universitat Politècnica de Catalunya. Departament d'Enginyeria Electrònica, Universitat Politècnica de Catalunya. CBA - Sistemes de Comunicacions i Arquitectures de Banda Ampla, Universitat Politècnica de Catalunya. EPIC - Energy Processing and Integrated Circuits, Abadal Cavallé, Sergi, Sheinman, Benny, Katz, Oded, Markish, Ofer, Elad, Danny, Fournier, Yvan, Roca, Damian, Hanzich, Mauricio, Houzeaux, Guillaume, Nemirovsky, Mario, Alarcón Cot, Eduardo José, Cabellos Aparicio, Alberto, Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Universitat Politècnica de Catalunya. Departament d'Enginyeria Electrònica, Universitat Politècnica de Catalunya. CBA - Sistemes de Comunicacions i Arquitectures de Banda Ampla, Universitat Politècnica de Catalunya. EPIC - Energy Processing and Integrated Circuits, Abadal Cavallé, Sergi, Sheinman, Benny, Katz, Oded, Markish, Ofer, Elad, Danny, Fournier, Yvan, Roca, Damian, Hanzich, Mauricio, Houzeaux, Guillaume, Nemirovsky, Mario, Alarcón Cot, Eduardo José, and Cabellos Aparicio, Alberto
Abstract: Broadcast traditionally has been regarded as a prohibitive communication transaction in multiprocessor environments. Nowadays, such a constraint largely drives the design of architectures and algorithms all-pervasive in diverse computing domains, directly and indirectly leading to diminishing performance returns as the many-core era is approaching. Novel interconnect technologies could help revert this trend by offering, among others, improved broadcast support, even in large-scale chip multiprocessors. This article outlines the prospects of wireless on-chip communication technologies pointing toward low-latency (a few cycles) and energy-efficient broadcast (a few picojoules per bit). It also discusses the challenges and potential impact of adopting these technologies as key enablers of unconventional hardware architectures and algorithmic approaches, in the pathway of significantly improving the performance, energy efficiency, scalability, and programmability of many-core chips., Peer Reviewed, Postprint (author's final draft)
Published: 2015

47. Virtualized security at the network edge: a user-centric approach

Author: Kuusijarvi, Jarkko, Sassu, Roberto, Montero Banegas, Diego Teodoro, Lioy, Antonio, Basile, Cataldo, Serral Gracià, René, Risso, Fulvio, Ciaccia, Francesco, Jacquin, Ludovic, Georgiades, Michael, Shaw, Adrian, Charalambides, Savvas, Bosco, Francesca, Nemirovsky, Mario, Yannuzzi,, Marcelo, Pastor, Antonio, Kuusijarvi, Jarkko, Sassu, Roberto, Montero Banegas, Diego Teodoro, Lioy, Antonio, Basile, Cataldo, Serral Gracià, René, Risso, Fulvio, Ciaccia, Francesco, Jacquin, Ludovic, Georgiades, Michael, Shaw, Adrian, Charalambides, Savvas, Bosco, Francesca, Nemirovsky, Mario, Yannuzzi,, Marcelo, and Pastor, Antonio
Abstract: The current device-centric protection model against security threats has serious limitations. On one hand, the proliferation of user terminals such as smartphones, tablets, notebooks, smart TVs, game consoles, and desktop computers makes it extremely difficult to achieve the same level of protection regardless of the device used. On the other hand, when various users share devices (e.g., parents and kids using the same devices at home), the setup of distinct security profiles, policies, and protection rules for the different users of a terminal is far from trivial. In light of this, this article advocates for a paradigm shift in user protection. In our model, protection is decoupled from users' terminals, and it is provided by the access network through a trusted virtual domain. Each trusted virtual domain provides unified and homogeneous security for a single user irrespective of the terminal employed. We describe a user-centric model where nontechnically savvy users can define their own profiles and protection rules in an intuitive way. We show that our model can harness the virtualization power offered by next-generation access networks, especially from network functions virtualization in the points of presence at the edge of telecom operators. We also analyze the distinctive features of our model, and the challenges faced based on the experience gained in the development of a proof of concept.
Published: 2015

48. Networking challenges and prospective impact of broadcast-oriented wireless networkson- chip

Author: Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Universitat Politècnica de Catalunya. Departament d'Enginyeria Electrònica, Universitat Politècnica de Catalunya. CBA - Sistemes de Comunicacions i Arquitectures de Banda Ampla, Universitat Politècnica de Catalunya. EPIC - Energy Processing and Integrated Circuits, Abadal Cavallé, Sergi, Nemirovsky, Mario, Alarcón Cot, Eduardo José, Cabellos Aparicio, Alberto, Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Universitat Politècnica de Catalunya. Departament d'Enginyeria Electrònica, Universitat Politècnica de Catalunya. CBA - Sistemes de Comunicacions i Arquitectures de Banda Ampla, Universitat Politècnica de Catalunya. EPIC - Energy Processing and Integrated Circuits, Abadal Cavallé, Sergi, Nemirovsky, Mario, Alarcón Cot, Eduardo José, and Cabellos Aparicio, Alberto
Abstract: The cost of broadcast has been constraining the design of manycore processors and of the algorithms that run upon them. However, as on-chip RF technologies allow the design of small-footprint and high-bandwidth antennas and transceivers, native low-latency (a few clock cycles) and low-power (a few pJ/bit) broadcast support through wireless communication can be envisaged. In this paper, we analyze the main networking design aspects and challenges of Broadcast-oriented Wireless Network-on-Chip (BoWNoC), which are basically reduced to the development of Medium Access Control (MAC) protocols able to handle hundreds of cores. We evaluate the broadcast performance and scalability of different MAC designs, to then discuss the impact that the proposed paradigm could exert on the performance, scalability and programmability of future manycore architectures, programming models and parallel algorithms., Peer Reviewed, Postprint (published version)
Published: 2015

49. NEMsCAM: A novel CAM cell based on nano-electro-mechanical switch and CMOS for energy efficient TLBs

Author: Barcelona Supercomputing Center, Seyedi, Azam, Karakostas, Vasileios, Cosemans, Stefan, Cristal Kestelman, Adrián, Nemirovsky, Mario, Unsal, Osman, Barcelona Supercomputing Center, Seyedi, Azam, Karakostas, Vasileios, Cosemans, Stefan, Cristal Kestelman, Adrián, Nemirovsky, Mario, and Unsal, Osman
Abstract: In this paper we propose a novel Content Addressable Memory (CAM) cell, NEMsCAM, based on both Nano-electro-mechanical (NEM) switches and CMOS technologies. The memory component of the proposed CAM cell is designed with two complementary non-volatile NEM switches and located on top of the CMOS-based comparison component. As a use case for the NEMsCAM cell, we design first-level data and instruction Translation Lookaside Buffers (TLBs) with 16nm CMOS technology at 2GHz. The simulations show that the NEMsCAM TLB reduces the energy consumption per search operation (by 27%), write operation (by 41.9%) and standby mode (by 53.9%), and the area (by 40.5%) compared to a CMOS-only TLB with minimal performance overhead., We thank all anonymous reviewers for their insightful comments. This work is supported in part by the European Union (FEDER funds) under contract TIN2012-34557, and the European Union’s Seventh Framework Programme (FP7/2007-2013) under the ParaDIME project (GA no. 318693), Postprint (author's final draft)
Published: 2015

50. On the Area and Energy Scalability of Wireless Network-on-Chip: A Model-Based Benchmarked Design Space Exploration

Author: Abadal, Sergi, primary, Iannazzo, Mario, additional, Nemirovsky, Mario, additional, Cabellos-Aparicio, Albert, additional, Lee, Heekwan, additional, and Alarcon, Eduard, additional
Published: 2015
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Region

Database

Publisher

82 results on '"Nemirovsky, Mario"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources