89 results on '"Nemirovsky, Mario"'
Search Results
2. SABES: statistical available bandwidth estimation from passive TCP measurements
- Author
-
Arcas Abella, Oriol, Montero Banegas, Diego Teodoro, Serral García, René, Ciaccia, Francesco, Romero, Iván, Nemirovsky, Mario, Arcas Abella, Oriol, Montero Banegas, Diego Teodoro, Serral García, René, Ciaccia, Francesco, Romero, Iván, and Nemirovsky, Mario
- Abstract
Estimating available network resources is fundamental when adapting the sending rate both at the application and transport layer. Traditional approaches either rely on active probing techniques or iteratively adapting the average sending rate, as is the case for modern TCP congestion control algorithms. In this paper, we propose a statistical method based on the inter-packet arrival time analysis of TCP acknowledgments to estimate a path available bandwidth. SABES first estimates the bottleneck link capacity exploiting the TCP flow slow start traffic patterns. Then, an heuristic based on the capacity estimation, provides an approximation of the end-to-end available bandwidth. Exhaustive experimentation on both simulations and real-world scenarios were conducted to validate our technique, and our results are promising. Furthermore, we train an artificial neural network to improve the estimation accuracy.
- Published
- 2020
3. SABES: Statistical Available Bandwidth EStimation from passive TCP measurements
- Author
-
Universitat Politècnica de Catalunya. Doctorat en Arquitectura de Computadors, Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Ciaccia, Francesco, Romero Ruiz, Ivan, Arcas Abella, Oriol, Montero Banegas, Diego Teodoro, Serral Gracià, René, Nemirovsky, Mario, Universitat Politècnica de Catalunya. Doctorat en Arquitectura de Computadors, Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Ciaccia, Francesco, Romero Ruiz, Ivan, Arcas Abella, Oriol, Montero Banegas, Diego Teodoro, Serral Gracià, René, and Nemirovsky, Mario
- Abstract
Estimating available network resources is fundamental when adapting the sending rate both at the application and transport layer. Traditional approaches either rely on active probing techniques or iteratively adapting the average sending rate, as is the case for modern TCP congestion control algorithms. In this paper, we propose a statistical method based on the inter-packet arrival time analysis of TCP acknowledgments to estimate a path available bandwidth. SABES first estimates the bottleneck link capacity exploiting the TCP flow slow start traffic patterns. Then, an heuristic based on the capacity estimation, provides an approximation of the end-to-end available bandwidth. Exhaustive experimentation on both simulations and real-world scenarios were conducted to validate our technique, and our results are promising. Furthermore, we train an artificial neural network to improve the estimation accuracy., This work was supported by the grant 2015DI023 as part of the Industrial PhD grants of AGAUR and Generalitat de Catalunya. Project co-financed by the Spanish Ministry of Ciencia Innovacion y Universidades with reference RTC-2017-6655-7 (FEDER)., Peer Reviewed, Postprint (author's final draft)
- Published
- 2020
4. Definition of new WAN paradigms enabled by smart measurements
- Author
-
Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Serral Gracià, René, Nemirovsky, Mario, Ciaccia, Francesco, Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Serral Gracià, René, Nemirovsky, Mario, and Ciaccia, Francesco
- Abstract
Nowadays massive amounts of data are being moved over the Internet thanks to data-hungry applications, Big Data, and multimedia content. Combined with a reduction in cost and augmented reliability for high-speed broadband access, the whole Internet infrastructure is facing new challenges especially when information crosses long geographical distances. That is the case for Wide Area Networks (WANs), which are typically traversed in enterprises with multi-site deployments. When a connection is established between end-points that are geographically distant with high latency and high bandwidth, data is flowing over a so-called Long Fat Network. Currently, transport protocols in end-points are not able to exploit the resources of such links, notably the most common TCP implementations still stuffer from design flaws that limit their efficiency. More recent developments still suffer from low fairness in resource sharing and lack of global visibility. We identify SD-WAN as an SDN use-case that can enable new transport protocols adoption, improving traffic behavior over WANs, without the need to modify the end-points. In this thesis, we explore new approaches to network measurements that will enable both end-points and SD-WAN edge routers, to gain visibility over the end-to-end network status. Such additional visibility promotes the development of smarter control mechanisms for network traffic. The preliminary study carried on comprises TCP behavior over WANs and existing methodologies to control its traffic patterns and enforce rate throttling. We also identify a specific use case that poses challenges for WAN scenarios: the Split TCP connections in a Performance Enhancing Proxy (PEP). New control mechanisms to improve resource utilization and fairness are defined in this project. Specifically, we propose a new approach called Receive Window Modulation (RWM) that allows edge-routers to control the sending rate of a TCP connection by modifying the window advertised by the r, Hoy en día, Internet mueve cantidades considerables de datos debido a aplicaciones que requieren muchos datos (Big Data). En combinación con una reducción en los costes y un aumento en la fiabilidad de los enlaces de acceso a banda ancha, la infraestructura de Internet tiene que hacer frente a nuevos retos, especialmente cuando la información tiene que atravesar grandes distancias geográficas. Esto es el caso de las Redes de Area Extendida (WANs), que típicamente forman parte de la infraestructura de empresas con distintas sedes y oficinas. Hoy en día, los protocoles de transporto en los puntos finales no son capaces de explotar los recursos de las WANs, las mas comunes siendo implementaciones de TCP, las cuales todavía sufren de fallos en sus diseños que limitan la eficiencia. Desarrollos TCP recientes todavía no garantizan una repartición equitativa de los recursos de red y faltan de visibilidad global. Identificamos SD-WAN como un caso de uso el cual puede facilitar la adopción de nuevos protocoles de transporte, mejorando el comportamiento del trafico de red sobre WANs, sin la necesidad de modificar los puntos finales. En esta tesis exploramos nuevas propuestas en el campo de las medidas de red, las cuales permiten tanto a puntos finales como a router de borde, de ganar visibilidad sobre el estado de la red. Dicha visibilidad añadida permite el desarrollo de mecanismos de control del trafico de red mas inteligentes. Identificamos un caso de uso especifico que pone retos en los escenarios WAN: las conexiones Split TCP en el caso de un Performance Enhancing Proxy (PEP). En el proyecto vienen definidos nuevos mecanismos que mejoran la explotación y repartición de los recursos de red. En concreto, proponemos un nuevo esquema llamado Receive Window Modulation (RWM), que permite a los routers de borde controlar la ratio de envío de una conexión TCP modificando la ventana de recepción declarada por el recibidor. Probamos que dicho controlador puede mejorar la eficienci, Postprint (published version)
- Published
- 2020
5. Evaluating University-Business Collaboration at Science Parks: a Business Perspective
- Author
-
Universitat Politècnica de Catalunya. Doctorat en Administració i Direcció d'Empreses, Universitat Politècnica de Catalunya. Departament de Ciències de la Computació, Universitat Politècnica de Catalunya. KEMLG - Grup d'Enginyeria del Coneixement i Aprenentatge Automàtic, Olvera Herrera, Claudia, Piqué, Josep Ma, Cortés García, Claudio Ulises, Nemirovsky, Mario, Universitat Politècnica de Catalunya. Doctorat en Administració i Direcció d'Empreses, Universitat Politècnica de Catalunya. Departament de Ciències de la Computació, Universitat Politècnica de Catalunya. KEMLG - Grup d'Enginyeria del Coneixement i Aprenentatge Automàtic, Olvera Herrera, Claudia, Piqué, Josep Ma, Cortés García, Claudio Ulises, and Nemirovsky, Mario
- Abstract
The evaluation of the companies’ performance at University Science Parks (SPs) becomes essential in identifying the needs of the companies and the feasibility of the University-Business Collaboration (ubc). The companies’ real needs are also of interest for universities and SPs, since they face the challenge of designing strategies that best help them to transfer knowledge more effectively. This research article focuses on Key Performance Indicators (kpis) in ubc, needs and business objectives of companies co-located at SPs in Spain and Mexico. This article (i) aims to identify the kpis in ubc used by co-located companies at SPs, and (ii) explore the kpis in ubc and critical success factors of SPs. This article focuses on the perspective of companies, with a secondary focus on the perspectives of SPs and universities. For this study, data was collected through online company surveys in Spain and Mexico., Postprint (published version)
- Published
- 2020
6. Advances in the Hierarchical Emergent Behaviors (HEB) approach to autonomous vehicles
- Author
-
Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Universitat Politècnica de Catalunya. CAP - Grup de Computació d'Altes Prestacions, Roca, Damian, Milito, Rodolfo, Nemirovsky, Mario, Valero Cortés, Mateo, Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Universitat Politècnica de Catalunya. CAP - Grup de Computació d'Altes Prestacions, Roca, Damian, Milito, Rodolfo, Nemirovsky, Mario, and Valero Cortés, Mateo
- Abstract
Widespread deployment of autonomous vehicles (AVs) presents formidable challenges in terms on handling scalability and complexity, particularly regarding vehicular reaction in the face of unforeseen corner cases. Hierarchical Emergent Behaviors (HEB) is a scalable architecture based on the concepts of emergent behaviors and hierarchical decomposition. It relies on a few simple but powerful rules to govern local vehicular interactions. Rather than requiring prescriptive programming of every possible scenario, HEB’s approach relies on global behaviors induced by the application of these local, well-understood rules. Our first two papers on HEB focused on a primal set of rules applied at the first hierarchical level. On the path to systematize a solid design methodology, this paper proposes additional rules for the second level, studies through simulations the resultant richer set of emergent behaviors, and discusses the communica-tion mechanisms between the different levels., Peer Reviewed, Postprint (author's final draft)
- Published
- 2020
7. HIRE: Hidden Inter-packet Red-shift Effect
- Author
-
Universitat Politècnica de Catalunya. Doctorat en Arquitectura de Computadors, Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Ciaccia, Francesco, Romero Ruiz, Ivan, Serral Gracià, René, Nemirovsky, Mario, Universitat Politècnica de Catalunya. Doctorat en Arquitectura de Computadors, Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Ciaccia, Francesco, Romero Ruiz, Ivan, Serral Gracià, René, and Nemirovsky, Mario
- Abstract
Over the years, different techniques have been proposed to detect bottleneck bandwidth and available bandwidth of an end-to-end path. However, to the author's knowledge, no work has been conducted on detecting which link or node on the path could be the narrow link. In this paper, we present a novel technique based on packet pairs dispersion analysis, whose objective is twofold: first, it allows to estimate the narrow link capacity using a new approach which takes into account both inter-packet time and packet propagation delay. Its second objective is to induce the specific hop in the end-to-end path which represents the narrow link. This is achieved by injecting packets trains with intermediate TTL-expiring packets which decrease the train rate when they cross the narrow link (red-shift effect). We validate our approach in simulations showing the tool robustness in very complex scenarios., This work was supported by the Industrial PhD grant 2015DI023 of AGAUR and Gencat and the project Efficient Smart Multi Connected Networks co-financed by the Spanish Ministry of Ciencia Innovacion y Universidades with reference RTC-2017-6655-7, The Spanish Agenda Estatal de Investigacin and the European Regional Development Fund (FEDER)., Peer Reviewed, Postprint (author's final draft)
- Published
- 2020
8. Evaluating University-Business Collaboration at Science Parks: a Business Perspective
- Author
-
Olvera, Claudia, primary, Piqué, Josep M., additional, Cortés, Ulises, additional, and Nemirovsky, Mario, additional
- Published
- 2020
- Full Text
- View/download PDF
9. Improving TCP performance and reducing self-induced congestion with receive window modulation
- Author
-
Universitat Politècnica de Catalunya. Doctorat en Arquitectura de Computadors, Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Ciaccia, Francesco, Arcas Abella, Oriol, Montero Banegas, Diego Teodoro, Romero Ruiz, Ivan, Milito, Rodolfo, Serral Gracià, René, Nemirovsky, Mario, Universitat Politècnica de Catalunya. Doctorat en Arquitectura de Computadors, Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Ciaccia, Francesco, Arcas Abella, Oriol, Montero Banegas, Diego Teodoro, Romero Ruiz, Ivan, Milito, Rodolfo, Serral Gracià, René, and Nemirovsky, Mario
- Abstract
We present a control module for software edge routers called Receive Window Modulation - RWM. Its main objective is to mitigate what we define as self-induced congestion: the result of traffic emission patterns at the source that cause buffering and packet losses in any of the intermediate routers along the path between the connection's endpoints. The controller modifies the receiver's TCP advertised window to match the computed bandwidth-delay product, based on the connection round-trip time estimation and the bandwidth locally available at the edge router. The implemented controller does not need any endpoint modification, allowing it to be deployed in corporate edge routers, increasing visibility and control capabilities. This scheme, when used in real-world experiments with loss-based congestion control algorithms such as CUBIC, is shown to optimize access link utilization and per-connection goodput, and to reduce latency variability and packet losses., This work was supported by the grant 2015DI023 in the framework of the Industrial PhD grants of AGAUR and Generalitat de Catalunya., Peer Reviewed, Postprint (author's final draft)
- Published
- 2019
10. Improving TCP performance and reducing self-induced congestion with receive window modulation
- Author
-
Montero Banegas, Diego Teodoro, Nemirovsky, Mario Daniel, Serral Graciá, René, Romero, Ivan, Arcas Abella, Oriol, Milito, Rodolfo, Ciaccia, Francesco, Montero Banegas, Diego Teodoro, Nemirovsky, Mario Daniel, Serral Graciá, René, Romero, Ivan, Arcas Abella, Oriol, Milito, Rodolfo, and Ciaccia, Francesco
- Abstract
We present a control module for software edge routers called Receive Window Modulation - RWM. Its main objective is to mitigate what we define as self-induced congestion: the result of traffic emission patterns at the source that cause buffering and packet losses in any of the intermediate routers along the path between the connection's endpoints. The controller modifies the receiver's TCP advertised window to match the computed bandwidth-delay product, based on the connection round-trip time estimation and the bandwidth locally available at the edge router. The implemented controller does not need any endpoint modification, allowing it to be deployed in corporate edge routers, increasing visibility and control capabilities. This scheme, when used in real-world experiments with loss-based congestion control algorithms such as CUBIC, is shown to optimize access link utilization and per-connection goodput, and to reduce latency variability and packet losses.
- Published
- 2019
11. The effectiveness of knowledge and technology transfer through university-business collaboration in science parks
- Author
-
Universitat Politècnica de Catalunya. Departament d'Organització d'Empreses, Cortés García, Claudio Ulises, Nemirovsky, Mario, Olvera, Caludia, Universitat Politècnica de Catalunya. Departament d'Organització d'Empreses, Cortés García, Claudio Ulises, Nemirovsky, Mario, and Olvera, Caludia
- Abstract
Science and Technology Parks (STPs) facilitate the flow of knowledge and technology among universities; R&D institutions; companies and markets, and foster the creation and growth of innovation-based companies. Among the diversities of STPs, it is possible to identify two types: (i) Science Parks (SPs), which involve university shareholding and (ii) Technology Parks (TPs), which are not owned by universities. This study will take into account only SPs since they are closely linked to the university, and they are the bridge between a University and companies in the process of Knowledge and Technology Transfer (KTT). The evaluation of the firms' performance in Science Parks results determinant to identify the needs of the companies and the feasibility of the University-Business Collaboration (UBC). The firms' real needs also are of interest for Universities and Science parks, since they face the challenge of designing strategies that best help them to transfer the knowledge more effectively. While previous studies have been focused on tenants´ innovation performance on-Park and off-Park, very little research has taken into account the Parks heterogeneity that may affect the firms' performance. This research paper focuses on SPs in Spain and Mexico due to data availability. This paper (i) aims to identify the Key Performance Indicators (KPIs) in UBC used by co-located companies at SPs, and (ii) explore the performance measure (KPIs) in UBC and critical success factors of SPs. For this study, data was collected through fifty eight online company surveys in Spain and forty two in México. This empirical analysis uses fourteen semi-structured interviews, addressed to SPs directors in order to explore (KPIs) and success factors of SPs in both countries, Los Parques Científicos y Tecnológicos (PTS) facilitan el flujo de conocimiento y tecnología entre las universidades; Centros de Investigación; empresas y mercados, y fomentan la Creación y crecimiento de empresas basadas en la innovación. Entre las diversidades de STP, es posible identificar dos tipos: (i) Parques científicos (SP), el los cuales la universidad, tiene una participación accionaria y (ii) Parques Tecnológicos (TP), en los cuales las universidades tienen una participación mínima de acciones. Este estudio tomará en cuenta solo los SP ya que están estrechamente vinculados a la universidad, y son el puente entre una universidad y empresas en proceso de transferencia de conocimiento y tecnología. (KTT). La evaluación del desempeño de las empresas en los parques científicos es determinante para identificar las necesidades de las empresas y la viabilidad de la Colaboración Universidad-Empresa. (UBC). Las necesidades reales de las empresas también son de interés para Universidades, ya a que enfrentan el desafío de diseñar estrategias que les ayuden a transferir el conocimiento de una forma más eficaz. Mientras que estudios anteriores se han centrado en medir la innovacion de las empresas (on-park y off-park), muy poca investigación ha tenido en cuenta la heterogeneidad de los SP, que puede afectar el desempeño de las empresas. Este trabajo de investigación se centra en los SP en España y México por disponibilidad de datos. Este estudio (i) tiene como objetivo identificar los Key Perfofromance Indicators (KPI) en UBC utilizados por las empresas establecidas en los SP, y (ii) explorar las métricas (KPI) en UBC y factores críticos de éxito de los SP. Para este estudio,se enviaron encuestas en linea a nueve SP de México y España y se obtuvieron cincuenta y ocho encuestas de empresas en España y cuarenta y dos en México. Este análisis también utiliza investigación qualitativa, ( 14 entrevistas semi-estructuradas, dirigidas a directores de SP), para explorar (KPI), Postprint (published version)
- Published
- 2019
12. Evaluating University-Business Collaboration at Science Parks: a Business Perspective.
- Author
-
Olvera, Claudia, Piqué, Josep M., Cortés, Ulises, and Nemirovsky, Mario
- Subjects
BUSINESS parks ,KNOWLEDGE transfer ,RESEARCH parks ,COOPERATIVE research ,CRITICAL success factor ,KEY performance indicators (Management) - Abstract
Copyright of Triple Helix is the property of Brill Academic Publishers and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
- Published
- 2021
- Full Text
- View/download PDF
13. Tackling IoT ultra large scale systems: Fog computing in support of hierarchical emergent behaviors
- Author
-
Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Barcelona Supercomputing Center, Universitat Politècnica de Catalunya. CAP - Grup de Computació d'Altes Prestacions, Roca, Damian, Milito, Rodolfo, Nemirovsky, Mario, Valero Cortés, Mateo, Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Barcelona Supercomputing Center, Universitat Politècnica de Catalunya. CAP - Grup de Computació d'Altes Prestacions, Roca, Damian, Milito, Rodolfo, Nemirovsky, Mario, and Valero Cortés, Mateo
- Abstract
The Internet of Things (IoT) marks a phase transition in the evolution of the Internet, distinguished by a massive connectivity and the interaction with the physical world. The organic evolution of IoT requires the consideration of three dimensions: scale, organization, and context. These dimensions are particularly relevant in Ultra Large Scale Systems (ULSS), of which autonomous vehicles is a prime example. Fog Computing is well positioned to support contextual awareness and communication, critical for ULSS. The design and orchestration of ULSS require fresh approaches, new organizing principles. A recent paper proposed Hierarchical Emergent Behaviors (HEB), an architecture that builds on established concepts of emergent behaviors and hierarchical decomposition and organization. HEB’s local rules induce emergent behaviors, i.e., useful behaviors not explicitly programmed. In this chapter we take a first step to validate HEB concepts through the study of two basic self-driven car “primitives”: exiting a platoon formation, and maneuvering in anticipation of obstacles beyond the range of on-board sensors. Fog nodes provide the critical contextual information required., Damian Roca work was supported by a Doctoral Scholarship provided by Fundación La Caixa. This work has been supported by the Spanish Government (Severo Ochoa grants SEV2015-0493) and by the Spanish Ministry of Science and Innovation (contracts TIN2015-65316-P)., Peer Reviewed, Postprint (author's final draft)
- Published
- 2018
14. A general guide to applying machine learning to computer architecture
- Author
-
Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Barcelona Supercomputing Center, Universitat Politècnica de Catalunya. CAP - Grup de Computació d'Altes Prestacions, Nemirovsky, Daniel, Arkose, Tugberk, Markovic, Nikola, Nemirovsky, Mario, Unsal, Osman Sabri, Cristal Kestelman, Adrián, Valero Cortés, Mateo, Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Barcelona Supercomputing Center, Universitat Politècnica de Catalunya. CAP - Grup de Computació d'Altes Prestacions, Nemirovsky, Daniel, Arkose, Tugberk, Markovic, Nikola, Nemirovsky, Mario, Unsal, Osman Sabri, Cristal Kestelman, Adrián, and Valero Cortés, Mateo
- Abstract
The resurgence of machine learning since the late 1990s has been enabled by significant advances in computing performance and the growth of big data. The ability of these algorithms to detect complex patterns in data which are extremely difficult to achieve manually, helps to produce effective predictive models. Whilst computer architects have been accelerating the performance of machine learning algorithms with GPUs and custom hardware, there have been few implementations leveraging these algorithms to improve the computer system performance. The work that has been conducted, however, has produced considerably promising results. The purpose of this paper is to serve as a foundational base and guide to future computer architecture research seeking to make use of machine learning models for improving system efficiency. We describe a method that highlights when, why, and how to utilize machine learning models for improving system performance and provide a relevant example showcasing the effectiveness of applying machine learning in computer architecture. We describe a process of data generation every execution quantum and parameter engineering. This is followed by a survey of a set of popular machine learning models. We discuss their strengths and weaknesses and provide an evaluation of implementations for the purpose of creating a workload performance predictor for different core types in an x86 processor. The predictions can then be exploited by a scheduler for heterogeneous processors to improve the system throughput. The algorithms of focus are stochastic gradient descent based linear regression, decision trees, random forests, artificial neural networks, and k-nearest neighbors., This work has been supported by the European Research Council (ERC) Advanced Grant RoMoL (Grant Agreemnt 321253) and by the Spanish Ministry of Science and Innovation (contract TIN 2015-65316P)., Peer Reviewed, Postprint (published version)
- Published
- 2018
15. Analysis and simulation of emergent architectures for internet of things
- Author
-
Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Nemirovsky, Mario, Valero Cortés, Mateo, Roca Marí, Damián, Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Nemirovsky, Mario, Valero Cortés, Mateo, and Roca Marí, Damián
- Abstract
The Internet of Things (IoT) promises a plethora of new services and applications supported by a wide range of devices that includes sensors and actuators. To reach its potential IoT must break down the silos that limit applications' interoperability and hinder their manageability. These silos' result from existing deployment techniques where each vendor set up its own infrastructure, duplicating the hardware and increasing the costs. Fog Computing can serve as the underlying platform to support IoT applications thus avoiding the silos'. Each application becomes a system formed by IoT devices (i.e. sensors, actuators), an edge infrastructure (i.e. Fog Computing) and the Cloud. In order to improve several aspects of human lives, different systems can interact to correlate data obtaining functionalities not achievable by any of the systems in isolation. Then, we can analyze the IoT as a whole system rather than a conjunction of isolated systems. Doing so leads to the building of Ultra-Large Scale Systems (ULSS), an extension of the concept of Systems of Systems (SoS), in several verticals including Autonomous Vehicles, Smart Cities, and Smart Grids. The scope of ULSS is large in the number of things and complex in the variety of applications, volume of data, and diversity of communication patterns. To handle this scale and complexity in this thesis we propose Hierarchical Emergent Behaviors (HEB), a paradigm that builds on the concepts of emergent behavior and hierarchical organization. Rather than explicitly program all possible situations in the vast space of ULSS scenarios, HEB relies on emergent behaviors induced by local rules that define the interactions of the "things" between themselves and also with their environment. We discuss the modifications to classical IoT architectures required by HEB, as well as the new challenges. Once these challenges such as scalability and manageability are addressed, we can illustrate HEB's usefulness dealing with an IoT-based U, El Internet de las Cosas (IoT) promete una plétora de nuevos servicios y aplicaciones habilitadas por una amplia gama de dispositivos que incluye sensores y actuadores. Para alcanzar su potencial, IoT debe superar los silos que limitan la interoperabilidad de las aplicaciones y dificultan su administración. Estos silos son el resultado de las técnicas de implementación existentes en las que cada proveedor instala su propia infraestructura y duplica el hardware, incrementando los costes. Fog Computing puede servir como la plataforma subyacente que soporte aplicaciones del IoT evitando así los silos. Cada aplicación se convierte en un sistema formado por dispositivos IoT (por ejemplo sensores y actuadores), una infraestructura (como Fog Computing) y la nube. Con el fin de mejorar varios aspectos de la vida humana, diferentes sistemas pueden interactuar para correlacionar datos obteniendo funcionalidades que no pueden lograrse por ninguno de los sistemas de forma aislada. Entonces, podemos analizar el IoT como un único sistema en lugar de una conjunción de sistemas aislados. Esta perspectiva conduce a la construcción de Ultra-Large Scale Systems (ULSS), una extensión del concepto de Systems of Systems (SoS), en varios verticales, incluidos los vehículos autónomos, Smart Cities y Smart Grids. El alcance de ULSS es vasto debido a la cantidad de dispositivos y complejo en la variedad de aplicaciones, volumen de datos y diversidad de patrones de comunicación. Para manejar esta escala y complejidad, en esta tesis proponemos Hierarchical Emergent Behaviors (HEB), un paradigma que se basa en los conceptos de comportamientos emergente y organización jerárquica. En lugar de programar explícitamente todas las situaciones posibles en el vasto espacio de escenarios presentes en los ULSS, HEB se basa en comportamientos emergentes inducidos por reglas locales que definen las interacciones de las "cosas" entre ellas y también con su entorno. Discutimos las modificaciones a las arquit, Postprint (published version)
- Published
- 2018
16. iQ: an efficient and flexible queue-based simulation framework
- Author
-
Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Barcelona Supercomputing Center, Universitat Politècnica de Catalunya. CAP - Grup de Computació d'Altes Prestacions, Roca, Damian, Nemirovsky, Daniel, Casas, Marc, Moreto Planas, Miquel, Valero Cortés, Mateo, Nemirovsky, Mario, Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Barcelona Supercomputing Center, Universitat Politècnica de Catalunya. CAP - Grup de Computació d'Altes Prestacions, Roca, Damian, Nemirovsky, Daniel, Casas, Marc, Moreto Planas, Miquel, Valero Cortés, Mateo, and Nemirovsky, Mario
- Abstract
Conventional system simulators are readily used by computer architects to design and evaluate their processor designs. These simulators provide reasonable levels of accuracy and execution detail but suffer from long simulation latencies and increased implementation complexity. In this work we propose iQ, a queue-based modeling technique that targets design space exploration and optimization studies at the core component level. iQ emulates processor elements by abstracting the implementation details into modular components composed of queue structures, delay parameters, probabilistic driven message generation and event control. Its easy reconfigurability makes iQ a highly flexible and powerful processor simulator. We have used iQ to build an Ivy Bridge and a Core 2 Duo processor model and have validated them against real hardware running SPEC CPU2006 Int achieving average error rates of 9.55% and 8.93%., The authors would like to thank Mauricio Breternitz, Rodolfo Milito, and Vasilis Karakostas for their helpful reviews. Damian Roca work was supported by a Doctoral Scholarship provided by Fundación La Caixa. This work has been supported by the Spanish Government (Severo Ochoa grants SEV2015-0493) and by the Spanish Ministry of Science and Innovation (contracts TIN2015-65316-P)., Peer Reviewed, Postprint (author's final draft)
- Published
- 2017
17. Fog function virtualization: A flexible solution for IoT applications
- Author
-
Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Barcelona Supercomputing Center, Universitat Politècnica de Catalunya. CAP - Grup de Computació d'Altes Prestacions, Roca, Damian, Quiroga, Josue V., Valero Cortés, Mateo, Nemirovsky, Mario, Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Barcelona Supercomputing Center, Universitat Politècnica de Catalunya. CAP - Grup de Computació d'Altes Prestacions, Roca, Damian, Quiroga, Josue V., Valero Cortés, Mateo, and Nemirovsky, Mario
- Abstract
The Internet of Things applications must carefully assess certain crucial factors such as the real-time and largely distributed nature of the “things”. Fog Computing provides an architecture to satisfy those requirements through nodes located from near the “things” till the edge. The problem comes with the integration of the Fog nodes into current infrastructures. This process requires the development of complex software solutions and prevents Fog growth. In this paper we propose three innovations to enhance Fog: (i) a new orchestration policy, (ii) the creation of constellations of nodes, and (iii) Fog Function Virtualization (FFV). All together will complement Fog to reach its true potential as a generic scalable platform, running multiple IoT applications simultaneously. Deploying a new service is reduced to the development of the application code, fact that brings the democratization of the Fog Computing paradigm through ease of deployment and cost reduction., The authors thanks Rodolfo Milito for his insightful comments and revisions. Damian Roca work was supported by a Doctoral Scholarship provided by Fundación La Caixa. Josue V. Quiroga work was supported by a Doctoral Scholarship provided by the Mexican National Council of Science and Technology (CONACyT). This work has been supported by the Spanish Government (Severo Ochoa grants SEV2015-0493) and by the Spanish Ministry of Science and Innovation (contracts TIN2015-65316-P)., Peer Reviewed, Postprint (author's final draft)
- Published
- 2017
18. Disaggregated Computing. An Evaluation of Current Trends for Datacentres
- Author
-
Barcelona Supercomputing Center, Meyer, Hugo, Sancho, Jose C., Quiroga, Josue V., Zyulkyarov, Ferad, Roca, Damian, Nemirovsky, Mario, Barcelona Supercomputing Center, Meyer, Hugo, Sancho, Jose C., Quiroga, Josue V., Zyulkyarov, Ferad, Roca, Damian, and Nemirovsky, Mario
- Abstract
Next generation data centers will likely be based on the emerging paradigm of disaggregated function-blocks-as-a-unit departing from the current state of mainboard-as-a-unit. Multiple functional blocks or bricks such as compute, memory and peripheral will be spread through the entire system and interconnected together via one or multiple high speed networks. The amount of memory available will be very large distributed among multiple bricks. This new architecture brings various benefits that are desirable in today’s data centers such as fine-grained technology upgrade cycles, fine-grained resource allocation, and access to a larger amount of memory and accelerators. An analysis of the impact and benefits of memory disaggregation is presented in this paper. One of the biggest challenges when analyzing these architectures is that memory accesses should be modeled correctly in order to obtain accurate results. However, modeling every memory access would generate a high overhead that can make the simulation unfeasible for real data center applications. A model to represent and analyze memory disaggregation has been designed and a statistics-based queuing-based full system simulator was developed to rapidly and accurately analyze applications performance in disaggregated systems. With a mean error of 10%, simulation results pointed out that the network layers may introduce overheads that degrade applications’ performance up to 66%. Initial results also suggest that low memory access bandwidth may degrade up to 20% applications’ performance., This project has received funding from the European Unions Horizon 2020 research and innovation programme under grant agreement No 687632 (dReDBox project) and TIN2015-65316-P - Computacion de Altas Prestaciones VII., Peer Reviewed, Postprint (published version)
- Published
- 2017
19. Scalability of Broadcast Performance in Wireless Network-on-Chip
- Author
-
Abadal, Sergi, primary, Mestres, Albert, additional, Nemirovsky, Mario, additional, Lee, Heekwan, additional, Gonzalez, Antonio, additional, Alarcon, Eduard, additional, and Cabellos-Aparicio, Albert, additional
- Published
- 2016
- Full Text
- View/download PDF
20. Emergent Behaviors in the Internet of Things: The Ultimate Ultra-Large-Scale System
- Author
-
Roca, Damian, primary, Nemirovsky, Daniel, additional, Nemirovsky, Mario, additional, Milito, Rodolfo, additional, and Valero, Mateo, additional
- Published
- 2016
- Full Text
- View/download PDF
21. Scalability of broadcast performance in wireless network-on-chip
- Author
-
Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Universitat Politècnica de Catalunya. Departament d'Enginyeria Electrònica, Universitat Politècnica de Catalunya. CBA - Sistemes de Comunicacions i Arquitectures de Banda Ampla, Universitat Politècnica de Catalunya. ARCO - Microarquitectura i Compiladors, Universitat Politècnica de Catalunya. EPIC - Energy Processing and Integrated Circuits, Abadal Cavallé, Sergi, Mestres Sugrañes, Albert, Nemirovsky, Mario, Lee, Heekwan, González Colás, Antonio María, Alarcón Cot, Eduardo José, Cabellos Aparicio, Alberto, Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Universitat Politècnica de Catalunya. Departament d'Enginyeria Electrònica, Universitat Politècnica de Catalunya. CBA - Sistemes de Comunicacions i Arquitectures de Banda Ampla, Universitat Politècnica de Catalunya. ARCO - Microarquitectura i Compiladors, Universitat Politècnica de Catalunya. EPIC - Energy Processing and Integrated Circuits, Abadal Cavallé, Sergi, Mestres Sugrañes, Albert, Nemirovsky, Mario, Lee, Heekwan, González Colás, Antonio María, Alarcón Cot, Eduardo José, and Cabellos Aparicio, Alberto
- Abstract
Networks-on-Chip (NoCs) are currently the paradigm of choice to interconnect the cores of a chip multiprocessor. However, conventional NoCs may not suffice to fulfill the on-chip communication requirements of processors with hundreds or thousands of cores. The main reason is that the performance of such networks drops as the number of cores grows, especially in the presence of multicast and broadcast traffic. This not only limits the scalability of current multiprocessor architectures, but also sets a performance wall that prevents the development of architectures that generate moderate-to-high levels of multicast. In this paper, a Wireless Network-on-Chip (WNoC) where all cores share a single broadband channel is presented. Such design is conceived to provide low latency and ordered delivery for multicast/broadcast traffic, in an attempt to complement a wireline NoC that will transport the rest of communication flows. To assess the feasibility of this approach, the network performance of WNoC is analyzed as a function of the system size and the channel capacity, and then compared to that of wireline NoCs with embedded multicast support. Based on this evaluation, preliminary results on the potential performance of the proposed hybrid scheme are provided, together with guidelines for the design of MAC protocols for WNoC., Peer Reviewed, Postprint (published version)
- Published
- 2016
22. Thread assignment in multicore/multithreaded processors: A statistical approach
- Author
-
Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Barcelona Supercomputing Center, Universitat Politècnica de Catalunya. CAP - Grup de Computació d'Altes Prestacions, Radojković, Petar, Carpenter, Paul Matthew, Moretó Planas, Miquel, Cakarevic, Vladimir, Verdú Mulà, Javier, Pajuelo González, Manuel Alejandro, Cazorla Almeida, Francisco Javier, Nemirovsky, Mario, Valero Cortés, Mateo, Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Barcelona Supercomputing Center, Universitat Politècnica de Catalunya. CAP - Grup de Computació d'Altes Prestacions, Radojković, Petar, Carpenter, Paul Matthew, Moretó Planas, Miquel, Cakarevic, Vladimir, Verdú Mulà, Javier, Pajuelo González, Manuel Alejandro, Cazorla Almeida, Francisco Javier, Nemirovsky, Mario, and Valero Cortés, Mateo
- Abstract
© 2015 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works., The introduction of multicore/multithreaded processors, comprised of a large number of hardware contexts (virtual CPUs) that share resources at multiple levels, has made process scheduling, in particular assignment of running threads to available hardware contexts, an important aspect of system performance. Nevertheless, thread assignment of applications running on state-of-the art processors is an NP-complete problem. Over the years, numerous studies have proposed heuristic-based algorithms for thread assignment. Since the thread assignment problem is intractable, it is in general impossible to know the performance of the optimal assignment, so the room for improvement of a given algorithm is also unknown. It is therefore hard to decide whether to invest more effort and time to improve an algorithm that may already be close to optimal. In this paper, we present a statistical approach to the thread assignment problem. First, we present a method that predicts the performance of the optimal thread assignment, based on the observed performance of each thread assignment in a random sample. The method is based on Extreme Value Theory (EVT), a branch of statistics that analyses extreme deviations from the population mean. We also propose sample pruning, a method that significantly reduces the time required to apply the statistical method by reducing the number of candidate solutions that need to be measured. Finally, we show that, if no suitable heuristic-based algorithm is available, a sample of several thousand random thread assignments is enough to obtain, with high confidence, an assignment with performance close to optimal. The presented approach is architecture and application independent, and it can be used to address the thread assignment problem in various domains. It is especially well suited for systems in which the workload seldom changes. An example is network systems, which typically provide a constant set of services that are known in advance, with network, This work has been supported by the Spanish Ministry of Science and Innovation under grant TIN2012-34557, the HiPEAC Network of Excellence, and by the European Research Council under the European Union’s 7th FP, ERC Grant Agreement number 321253. Miquel Moreto has been partially supported by the Ministry of Economy and Competitiveness under Juan de la Cierva postdoctoral fellowship number JCI-2012-15047., Peer Reviewed, Postprint (author's final draft)
- Published
- 2016
23. Thread Assignment in Multicore/Multithreaded Processors: A Statistical Approach
- Author
-
Radojkovic, Petar, Carpenter, Paul M., Moreto, Miquel, Cakarevic, Vladimir, Verdu, Javier, Pajuelo, Alex, Cazorla, Francisco J., Nemirovsky, Mario, Valero, Mateo, Radojkovic, Petar, Carpenter, Paul M., Moreto, Miquel, Cakarevic, Vladimir, Verdu, Javier, Pajuelo, Alex, Cazorla, Francisco J., Nemirovsky, Mario, and Valero, Mateo
- Abstract
The introduction of multicore/multithreaded processors, comprised of a large number of hardware contexts (virtual CPUs) that share resources at multiple levels, has made process scheduling, in particular assignment of running threads to available hardware contexts, an important aspect of system performance. Nevertheless, thread assignment of applications running on state-of-the art processors is an NP-complete problem. Over the years, numerous studies have proposed heuristic-based algorithms for thread assignment. Since the thread assignment problem is intractable, it is in general impossible to know the performance of the optimal assignment, so the room for improvement of a given algorithm is also unknown. It is therefore hard to decide whether to invest more effort and time to improve an algorithm that may already be close to optimal. In this paper, we present a statistical approach to the thread assignment problem. First, we present a method that predicts the performance of the optimal thread assignment, based on the observed performance of each thread assignment in a random sample. The method is based on Extreme Value Theory (EVT), a branch of statistics that analyses extreme deviations from the population mean. We also propose sample pruning, a method that significantly reduces the time required to apply the statistical method by reducing the number of candidate solutions that need to be measured. Finally, we show that, if no suitable heuristic-based algorithm is available, a sample of several thousand random thread assignments is enough to obtain, with high confidence, an assignment with performance close to optimal. The presented approach is architecture and application independent, and it can be used to address the thread assignment problem in various domains. It is especially well suited for systems in which the workload seldom changes. An example is network systems, which typically provide a constant set of services that are known in advance, with network
- Published
- 2016
24. Emergent behaviors in the Internet of things: The ultimate ultra-large-scale system
- Author
-
Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Barcelona Supercomputing Center, Universitat Politècnica de Catalunya. CAP - Grup de Computació d'Altes Prestacions, Roca, Damian, Nemirovsky, Daniel, Nemirovsky, Mario, Milito, Rodolfo, Valero Cortés, Mateo, Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Barcelona Supercomputing Center, Universitat Politècnica de Catalunya. CAP - Grup de Computació d'Altes Prestacions, Roca, Damian, Nemirovsky, Daniel, Nemirovsky, Mario, Milito, Rodolfo, and Valero Cortés, Mateo
- Abstract
To reach its potential, the Internet of Things (IoT) must break down the silos that limit applications' interoperability and hinder their manageability. Doing so leads to the building of ultra-large-scale systems (ULSS) in several areas, including autonomous vehicles, smart cities, and smart grids. The scope of ULSS is both large and complex. Thus, the authors propose Hierarchical Emergent Behaviors (HEB), a paradigm that builds on the concepts of emergent behavior and hierarchical organization. Rather than explicitly programming all possible decisions in the vast space of ULSS scenarios, HEB relies on the emergent behaviors induced by local rules at each level of the hierarchy. The authors discuss the modifications to classical IoT architectures required by HEB, as well as the new challenges. They also illustrate the HEB concepts in reference to autonomous vehicles. This use case paves the way to the discussion of new lines of research., Damian Roca work was supported by a Doctoral Scholarship provided by Fundación La Caixa. This work has been supported by the Spanish Government (Severo Ochoa grants SEV2015-0493) and by the Spanish Ministry of Science and Innovation (contracts TIN2015-65316-P)., Peer Reviewed, Postprint (author's final draft)
- Published
- 2016
25. Scalability of Broadcast Performance in Wireless Network-on-Chip
- Author
-
Barcelona Supercomputing Center, Abadal Cavallé, Sergi, Mestres Sugrañes, Albert, Nemirovsky, Mario, Lee, Heekwan, Gonzalez, Antonio, Alarcón Cot, Eduardo José, Cabellos Aparicio, Alberto, Barcelona Supercomputing Center, Abadal Cavallé, Sergi, Mestres Sugrañes, Albert, Nemirovsky, Mario, Lee, Heekwan, Gonzalez, Antonio, Alarcón Cot, Eduardo José, and Cabellos Aparicio, Alberto
- Abstract
Networks-on-Chip (NoCs) are currently the paradigm of choice to interconnect the cores of a chip multiprocessor. However, conventional NoCs may not suffice to fulfill the on-chip communication requirements of processors with hundreds or thousands of cores. The main reason is that the performance of such networks drops as the number of cores grows, especially in the presence of multicast and broadcast traffic. This not only limits the scalability of current multiprocessor architectures, but also sets a performance wall that prevents the development of architectures that generate moderate-to-high levels of multicast. In this paper, a Wireless Network-on-Chip (WNoC) where all cores share a single broadband channel is presented. Such design is conceived to provide low latency and ordered delivery for multicast/broadcast traffic, in an attempt to complement a wireline NoC that will transport the rest of communication flows. To assess the feasibility of this approach, the network performance of WNoC is analyzed as a function of the system size and the channel capacity, and then compared to that of wireline NoCs with embedded multicast support. Based on this evaluation, preliminary results on the potential performance of the proposed hybrid scheme are provided, together with guidelines for the design of MAC protocols for WNoC., The authors gratefully acknowledge support from INTEL’s Doctoral Student Honor Program, as well as from Samsung’s Global Research Outreach (GRO) program. This work has been also partially supported by the Catalan Government through a FI-AGAUR grant and by the Spanish State Ministry of Economy and Competitiveness under grant aid PCIN-2015-012., Peer Reviewed, Postprint (author's final draft)
- Published
- 2016
26. Improving the performance and energy-efficiency of virtual memory
- Author
-
Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Nemirovsky, Mario, Ünsal, Osman, Cristal Kestelman, Adrián, Karakostas, Vasileios, Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Nemirovsky, Mario, Ünsal, Osman, Cristal Kestelman, Adrián, and Karakostas, Vasileios
- Abstract
Virtual memory improves programmer productivity, enhances process security, and increases memory utilization. However, virtual memory requires an address translation from the virtual to the physical address space on every memory operation. Page-based implementations of virtual memory divide physical memory into fixed size pages, and use a per-process page table to map virtual pages to physical pages. The hardware key component for accelerating address translation is the Translation Lookaside Buffer (TLB), that holds recently used mappings from the virtual to the physical address space. However, address translation still incurs high (i) performance overheads due to costly page table walks after TLB misses, and (ii) energy overheads due to frequent TLB lookups on every memory operation. This thesis quantifies these overheads and proposes techniques to mitigate them. In this thesis we argue that fixed size page-based approaches for address translation exhibit limited potential for improving TLB performance because they increase the TLB reach by a fixed amount. To overcome the limitations of such approaches, we introduce the concept of range translations and we show how they can significantly improve the performance and energy-efficiency of address translation. We first comprehensively quantify the address translation performance overhead on a collection of emerging scale-out applications. We show that address translation accounts for up to 16% of the total execution time. We find that huge pages may improve the application performance by reducing the time spent in page walks, enabling better exploitation of the available execution resources. However, the limited hardware support for huge pages in combination with the workloads' low memory locality leave ample space for performance optimizations. To reduce the performance overheads of address translation, we propose Redundant Memory Mappings (RMM). RMM provides an efficient alternative representation of many virtual-to, La memoria virtual aumenta la productividad del programador, provee seguridad a los procesos e incrementa la utilización de la memoria. No obstante, la memoria virtual requiere de una traducción de direcciones entre los espacios de direcciones virtual y físico en cada operación de memoria. La implementación de la memoria virtual paginada divide la memoria física en páginas de tamaño fijo. El principal componente para acelerar la traducción de direcciones es la TLB (Translation Lookaside Buffer). Sin embargo, la traducción de direcciones tiene un alto coste en el rendimiento, por la necesidad de buscar en la tabla de páginas después de un fallo de TLB, y por el coste energético por las frecuentes búsquedas en la TLB (una por cada operación de memoria). En esta tesis defendemos que los mecanismos de traducción basados en páginas tienen un potencial limitado para aumentar el rendimiento de la TLB. Principalmente porque solo se puede aumentar en una cantidad limitada el conjunto de direcciones que la TLB puede traducir. Para superar esta limitaciones, introducimos el concepto de traducciones por rangos y mostramos como este mecanismo puede mejorar significativamente el rendimiento y la eficiencia energética en la traducción de direcciones. Primero, cuantificamos la pérdida de rendimiento debido a la traducción en aplicaciones emergentes que escalan bien al agregar más procesadores. Mostramos que en estas aplicaciones la traducción de direcciones es responsable de hasta el 16% del tiempo de ejecución. Además, también mostramos que las páginas grandes pueden mejorar el rendimiento de las aplicaciones, permitiendo un mejor uso de los recursos disponibles. Sin embargo, el limitado soporte del hardware para páginas grandes, combinado con cargas de trabajo con poca localidad, nos deja mucho espacio para la optimización. Para reducir los costes de rendimiento de la traducción de direcciones, proponemos RMM (Redundant Memory Mappings). RMM esta basado en rangos de páginas y ofr, Postprint (published version)
- Published
- 2016
27. Range Translations for Fast Virtual Memory
- Author
-
Gandhi, Jayneel, primary, Karakostas, Vasileios, additional, Ayar, Furkan, additional, Cristal, Adrian, additional, Hill, Mark D., additional, McKinley, Kathryn S., additional, Nemirovsky, Mario, additional, Swift, Michael M., additional, and Unsal, Osman S., additional
- Published
- 2016
- Full Text
- View/download PDF
28. Thread Assignment in Multicore/Multithreaded Processors: A Statistical Approach
- Author
-
Radojkovic, Petar, primary, Carpenter, Paul M., additional, Moreto, Miquel, additional, Cakarevic, Vladimir, additional, Verdu, Javier, additional, Pajuelo, Alex, additional, Cazorla, Francisco J., additional, Nemirovsky, Mario, additional, and Valero, Mateo, additional
- Published
- 2016
- Full Text
- View/download PDF
29. Broadcast-enabled massive multicore architectures: a wireless RF approach
- Author
-
Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Universitat Politècnica de Catalunya. Departament d'Enginyeria Electrònica, Universitat Politècnica de Catalunya. CBA - Sistemes de Comunicacions i Arquitectures de Banda Ampla, Universitat Politècnica de Catalunya. EPIC - Energy Processing and Integrated Circuits, Abadal Cavallé, Sergi, Sheinman, Benny, Katz, Oded, Markish, Ofer, Elad, Danny, Fournier, Yvan, Roca, Damian, Hanzich, Mauricio, Houzeaux, Guillaume, Nemirovsky, Mario, Alarcón Cot, Eduardo José, Cabellos Aparicio, Alberto, Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Universitat Politècnica de Catalunya. Departament d'Enginyeria Electrònica, Universitat Politècnica de Catalunya. CBA - Sistemes de Comunicacions i Arquitectures de Banda Ampla, Universitat Politècnica de Catalunya. EPIC - Energy Processing and Integrated Circuits, Abadal Cavallé, Sergi, Sheinman, Benny, Katz, Oded, Markish, Ofer, Elad, Danny, Fournier, Yvan, Roca, Damian, Hanzich, Mauricio, Houzeaux, Guillaume, Nemirovsky, Mario, Alarcón Cot, Eduardo José, and Cabellos Aparicio, Alberto
- Abstract
Broadcast traditionally has been regarded as a prohibitive communication transaction in multiprocessor environments. Nowadays, such a constraint largely drives the design of architectures and algorithms all-pervasive in diverse computing domains, directly and indirectly leading to diminishing performance returns as the many-core era is approaching. Novel interconnect technologies could help revert this trend by offering, among others, improved broadcast support, even in large-scale chip multiprocessors. This article outlines the prospects of wireless on-chip communication technologies pointing toward low-latency (a few cycles) and energy-efficient broadcast (a few picojoules per bit). It also discusses the challenges and potential impact of adopting these technologies as key enablers of unconventional hardware architectures and algorithmic approaches, in the pathway of significantly improving the performance, energy efficiency, scalability, and programmability of many-core chips., Peer Reviewed, Postprint (author's final draft)
- Published
- 2015
30. Virtualized security at the network edge: a user-centric approach
- Author
-
Kuusijarvi, Jarkko, Sassu, Roberto, Montero Banegas, Diego Teodoro, Lioy, Antonio, Basile, Cataldo, Serral Gracià, René, Risso, Fulvio, Ciaccia, Francesco, Jacquin, Ludovic, Georgiades, Michael, Shaw, Adrian, Charalambides, Savvas, Bosco, Francesca, Nemirovsky, Mario, Yannuzzi,, Marcelo, Pastor, Antonio, Kuusijarvi, Jarkko, Sassu, Roberto, Montero Banegas, Diego Teodoro, Lioy, Antonio, Basile, Cataldo, Serral Gracià, René, Risso, Fulvio, Ciaccia, Francesco, Jacquin, Ludovic, Georgiades, Michael, Shaw, Adrian, Charalambides, Savvas, Bosco, Francesca, Nemirovsky, Mario, Yannuzzi,, Marcelo, and Pastor, Antonio
- Abstract
The current device-centric protection model against security threats has serious limitations. On one hand, the proliferation of user terminals such as smartphones, tablets, notebooks, smart TVs, game consoles, and desktop computers makes it extremely difficult to achieve the same level of protection regardless of the device used. On the other hand, when various users share devices (e.g., parents and kids using the same devices at home), the setup of distinct security profiles, policies, and protection rules for the different users of a terminal is far from trivial. In light of this, this article advocates for a paradigm shift in user protection. In our model, protection is decoupled from users' terminals, and it is provided by the access network through a trusted virtual domain. Each trusted virtual domain provides unified and homogeneous security for a single user irrespective of the terminal employed. We describe a user-centric model where nontechnically savvy users can define their own profiles and protection rules in an intuitive way. We show that our model can harness the virtualization power offered by next-generation access networks, especially from network functions virtualization in the points of presence at the edge of telecom operators. We also analyze the distinctive features of our model, and the challenges faced based on the experience gained in the development of a proof of concept.
- Published
- 2015
31. Networking challenges and prospective impact of broadcast-oriented wireless networkson- chip
- Author
-
Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Universitat Politècnica de Catalunya. Departament d'Enginyeria Electrònica, Universitat Politècnica de Catalunya. CBA - Sistemes de Comunicacions i Arquitectures de Banda Ampla, Universitat Politècnica de Catalunya. EPIC - Energy Processing and Integrated Circuits, Abadal Cavallé, Sergi, Nemirovsky, Mario, Alarcón Cot, Eduardo José, Cabellos Aparicio, Alberto, Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Universitat Politècnica de Catalunya. Departament d'Enginyeria Electrònica, Universitat Politècnica de Catalunya. CBA - Sistemes de Comunicacions i Arquitectures de Banda Ampla, Universitat Politècnica de Catalunya. EPIC - Energy Processing and Integrated Circuits, Abadal Cavallé, Sergi, Nemirovsky, Mario, Alarcón Cot, Eduardo José, and Cabellos Aparicio, Alberto
- Abstract
The cost of broadcast has been constraining the design of manycore processors and of the algorithms that run upon them. However, as on-chip RF technologies allow the design of small-footprint and high-bandwidth antennas and transceivers, native low-latency (a few clock cycles) and low-power (a few pJ/bit) broadcast support through wireless communication can be envisaged. In this paper, we analyze the main networking design aspects and challenges of Broadcast-oriented Wireless Network-on-Chip (BoWNoC), which are basically reduced to the development of Medium Access Control (MAC) protocols able to handle hundreds of cores. We evaluate the broadcast performance and scalability of different MAC designs, to then discuss the impact that the proposed paradigm could exert on the performance, scalability and programmability of future manycore architectures, programming models and parallel algorithms., Peer Reviewed, Postprint (published version)
- Published
- 2015
32. NEMsCAM: A novel CAM cell based on nano-electro-mechanical switch and CMOS for energy efficient TLBs
- Author
-
Barcelona Supercomputing Center, Seyedi, Azam, Karakostas, Vasileios, Cosemans, Stefan, Cristal Kestelman, Adrián, Nemirovsky, Mario, Unsal, Osman, Barcelona Supercomputing Center, Seyedi, Azam, Karakostas, Vasileios, Cosemans, Stefan, Cristal Kestelman, Adrián, Nemirovsky, Mario, and Unsal, Osman
- Abstract
In this paper we propose a novel Content Addressable Memory (CAM) cell, NEMsCAM, based on both Nano-electro-mechanical (NEM) switches and CMOS technologies. The memory component of the proposed CAM cell is designed with two complementary non-volatile NEM switches and located on top of the CMOS-based comparison component. As a use case for the NEMsCAM cell, we design first-level data and instruction Translation Lookaside Buffers (TLBs) with 16nm CMOS technology at 2GHz. The simulations show that the NEMsCAM TLB reduces the energy consumption per search operation (by 27%), write operation (by 41.9%) and standby mode (by 53.9%), and the area (by 40.5%) compared to a CMOS-only TLB with minimal performance overhead., We thank all anonymous reviewers for their insightful comments. This work is supported in part by the European Union (FEDER funds) under contract TIN2012-34557, and the European Union’s Seventh Framework Programme (FP7/2007-2013) under the ParaDIME project (GA no. 318693), Postprint (author's final draft)
- Published
- 2015
33. Virtualized security at the network edge: a user-centric approach
- Author
-
Montero, Diego, primary, Yannuzzi, Marcelo, additional, Shaw, Adrian, additional, Jacquin, Ludovic, additional, Pastor, Antonio, additional, Serral-Gracia, Rene, additional, Lioy, Antonio, additional, Risso, Fulvio, additional, Basile, Cataldo, additional, Sassu, Roberto, additional, Nemirovsky, Mario, additional, Ciaccia, Francesco, additional, Georgiades, Michael, additional, Charalambides, Savvas, additional, Kuusijarvi, Jarkko, additional, and Bosco, Francesca, additional
- Published
- 2015
- Full Text
- View/download PDF
34. High level queuing architecture model for high-end processors
- Author
-
Nemirovsky, Mario, Moreto Planas, Miquel, Roca Marí, Damián, Nemirovsky, Mario, Moreto Planas, Miquel, and Roca Marí, Damián
- Abstract
We have developed a new kind of simulator based in queue models and statistical methods. It allows a fast and accurate simulation. It is really useful to perform a really fast design space exploration. We have validated the model against a real chip, Intel Ivy Bridge Processor
- Published
- 2014
35. Key ingredients in an IoT recipe: fog computing, cloud computing, and more fog computing
- Author
-
Yannuzzi,, Marcelo, Serral Gracià, René, Nemirovsky, Mario, Montero Banegas, Diego Teodoro, Milito, Rodolfo A., Yannuzzi,, Marcelo, Serral Gracià, René, Nemirovsky, Mario, Montero Banegas, Diego Teodoro, and Milito, Rodolfo A.
- Abstract
This paper examines some of the most promising and challenging scenarios in IoT, and shows why current compute and storage models confined to data centers will not be able to meet the requirements of many of the applications foreseen for those scenarios. Our analysis is particularly centered on three interrelated requirements: 1) mobility; 2) reliable control and actuation; and 3) scalability, especially, in IoT scenarios that span large geographical areas and require real-time decisions based on data analytics. Based on our analysis, we expose the reasons why Fog Computing is the natural platform for IoT, and discuss the unavoidable interplay of the Fog and the Cloud in the coming years. In the process, we review some of the technologies that will require considerable advances in order to support the applications that the IoT market will demand.
- Published
- 2014
36. On the area and energy scalability of wireless network-on-chip: a model-based benchmarked design space exploration
- Author
-
Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Universitat Politècnica de Catalunya. Departament d'Enginyeria Electrònica, Universitat Politècnica de Catalunya. CBA - Sistemes de Comunicacions i Arquitectures de Banda Ampla, Universitat Politècnica de Catalunya. EPIC - Energy Processing and Integrated Circuits, Abadal Cavallé, Sergi, Iannazzo Soteras, Mario Enrique, Nemirovsky, Mario, Cabellos Aparicio, Alberto, Lee, Heekwan, Alarcón Cot, Eduardo José, Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Universitat Politècnica de Catalunya. Departament d'Enginyeria Electrònica, Universitat Politècnica de Catalunya. CBA - Sistemes de Comunicacions i Arquitectures de Banda Ampla, Universitat Politècnica de Catalunya. EPIC - Energy Processing and Integrated Circuits, Abadal Cavallé, Sergi, Iannazzo Soteras, Mario Enrique, Nemirovsky, Mario, Cabellos Aparicio, Alberto, Lee, Heekwan, and Alarcón Cot, Eduardo José
- Abstract
Networks-on-Chip (NoCs) are emerging as the way to interconnect the processing cores and the memory within a chip multiprocessor. As recent years have seen a significant increase in the number of cores per chip, it is crucial to guarantee the scalability of NoCs in order to avoid communication to become the next performance bottleneck in multicore processors. Among other alternatives, the concept of Wireless Network-on- Chip (WNoC) has been proposed, wherein on-chip antennas would provide native broadcast capabilities leading to enhanced network performance. Since energy consumption and chip area are the two primary constraints, this work is aimed to explore the area and energy implications of scaling a WNoC in terms of (a) the number of cores within the chip, and (b) the capacity of each link in the network. To this end, an integral design space exploration is performed, covering implementation aspects (area and energy), communication aspects (link capacity) and networklevel considerations (number of cores and network architecture). The study is entirely based upon analytical models, which will allow to benchmark the WNoC scalability against a baseline NoC. Eventually, this investigation will provide qualitative and quantitative guidelines for the design of future transceivers for wireless on-chip communication., Peer Reviewed, Postprint (author’s final draft)
- Published
- 2014
37. Measuring operating system overhead on Sun UltraSparc T1 processor
- Author
-
Radojković, Petar, Cakarevic, Vladimir, Verdú Mulà, Javier|||0000-0003-4485-2419, Pajuelo González, Manuel Alejandro|||0000-0002-5510-6860, Gioiosa, Roberto, Cazorla Almeida, Francisco Javier, Nemirovsky, Mario, Valero Cortés, Mateo|||0000-0003-2917-2482, Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, and Universitat Politècnica de Catalunya. CAP - Grup de Computació d'Altes Prestacions
- Subjects
Multicore multithreaded processors ,Software_OPERATINGSYSTEMS ,Overhead ,Operating systems (Computers) ,Linux ,Sistemes operatius (Ordinadors) ,Informàtica::Arquitectura de computadors [Àrees temàtiques de la UPC] ,Informàtica::Sistemes operatius [Àrees temàtiques de la UPC] ,Solaris (Computer file) ,Operating systems - Abstract
Numerous studies have shown that Operating System (OS) noise is one of the reasons for significant performance degradation in clustered architectures. Although many studies examine the OS noise for High Performance Computing, especially in multi-processor/core systems, most of them focus on 2- or 4-core systems. In this study, we analyze sources of OS noise on a massive multithreading processor, the Sun UltraSPARC T1.We compare results, measured in Linux and Solaris, with the results provided by a low-overhead runtime environment that introduces almost no overhead in applications’ execution time. Our results show that the overhead introduced by the OS timer interrupt in Linux and Solaris depends on the particular core and hardware context in which the application is running. This overhead is up to 30% when the application is executed on the same hardware context as the timer interrupt handler, and up to 10% when the application and the timer interrupt handler run on different contexts but on the same core. We detect no overhead when the benchmark and the timer interrupt handler run on different cores of the processor.
- Published
- 2009
38. Graphene-Enabled Wireless Communication for Massive Multicore Architectures
- Author
-
Abadal, Sergi, Alarcon, Eduard, Cabellos-Aparicio, Albert, Lemme, Max C., Nemirovsky, Mario, Abadal, Sergi, Alarcon, Eduard, Cabellos-Aparicio, Albert, Lemme, Max C., and Nemirovsky, Mario
- Abstract
Current trends in microprocessor architecture design are leading towards a dramatic increase of core-level parallelization, wherein a given number of independent processors or cores are interconnected. Since the main bottleneck is foreseen to migrate from computation to communication, efficient and scalable means of inter-core communication are crucial for guaranteeing steady performance improvements in many-core processors. As the number of cores grows, it remains unclear whether initial proposals, such as the Network-on-Chip (NoC) paradigm, will meet the stringent requirements of this scenario. This position paper presents a new research area where massive multicore architectures have wireless communication capabilities at the core level. This goal is feasible by using graphene-based planar antennas, which can radiate signals at the Terahertz band while utilizing lower chip area than its metallic counterparts. To the best of our knowledge, this is the first work that discusses the utilization of graphene-enabled wireless communication for massive multicore processors. Such wireless systems enable broadcasting, multicasting, all-to-all communication, as well as significantly reduce many of the issues present in massively multicore environments, such as data coherency, consistency, synchronization and communication problems. Several open research challenges are pointed out related to implementation, communications and multicore architectures, which pave the way for future research in this multidisciplinary area., QC 20131220
- Published
- 2013
- Full Text
- View/download PDF
39. Improving the energy efficiency of hardware-assisted watchpoint systems
- Author
-
Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Karakostas, Vasileios, Tomić, Saša, Unsal, Osman Sabri, Nemirovsky, Mario, Cristal Kestelman, Adrián, Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Karakostas, Vasileios, Tomić, Saša, Unsal, Osman Sabri, Nemirovsky, Mario, and Cristal Kestelman, Adrián
- Abstract
Hardware-assisted watchpoint systems enhance the execution of numerous dynamic software techniques, such as memory protection, module isolation, deterministic execution, and data race detection. In this paper, we show that previous hardware proposals may introduce significant energy overheads, and propose WatchPoint Filtering (WPF), a novel filtering mechanism that eliminates unnecessary watchpoint checks. We evaluate WPF on two state-of-the-art proposals for hardware-assisted watchpoints using two common memory checkers. WPF eliminates 83% of the watchpoint checks (up to 99.7%) and reduces 57% of the dynamic energy overhead (up to 78%) on average, without introducing additional performance execution overhead., Postprint (published version)
- Published
- 2013
40. Improving the effective use of multithreaded architectures : implications on compilation, thread assignment, and timing analysis
- Author
-
Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Cazorla Almeida, Francisco Javier, Verdú Mulà, Javier, Pajuelo González, Manuel Alejandro, Nemirovsky, Mario, Radojković, Petar, Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Cazorla Almeida, Francisco Javier, Verdú Mulà, Javier, Pajuelo González, Manuel Alejandro, Nemirovsky, Mario, and Radojković, Petar
- Abstract
This thesis presents cross-domain approaches that improve the effective use of multithreaded architectures. The contributions of the thesis can be classified in three groups. First, we propose several methods for thread assignment of network applications running in multithreaded network servers. Second, we analyze the problem of graph partitioning that is a part of the compilation process of multithreaded streaming applications. Finally, we present a method that improves the measurement-based timing analysis of multithreaded architectures used in time-critical environments. The following sections summarize each of the contributions. (1) Thread assignment on multithreaded processors: State-of-the-art multithreaded processors have different level of resource sharing (e.g. between thread running on the same core and globally shared resources). Thus, the way that threads of a given workload are assigned to processors' hardware contexts determines which resources the threads share, which, in turn, may significantly affect the system performance. In this thesis, we demonstrate the importance of thread assignment for network applications running in multithreaded servers. We also present TSBSched and BlackBox scheduler, methods for thread assignment of multithreaded network applications running on processors with several levels of resource sharing. Finally, we propose a statistical approach to the thread assignment problem. In particular, we show that running a sample of several hundred or several thousand random thread assignments is sufficient to capture at least one out of 1% of the best-performing assignments with a very high probability. We also describe the method that estimates the optimal system performance for given workload. We successfull y applied TSBSched, BlackBox scheduler, and the presented statistical approach to a case study of thread assignment of multithreaded network applications running on the UltraSPARC T2 processor. (2) Kernel partitioning of streami, Postprint (published version)
- Published
- 2013
41. Thread assignment of multithreaded network applications in multicore/multithreaded processors
- Author
-
Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Universitat Politècnica de Catalunya. CAP - Grup de Computació d'Altes Prestacions, Radojković, Petar, Cakarevic, Vladimir, Verdú Mulà, Javier, Pajuelo González, Manuel Alejandro, Cazorla Almeida, Francisco Javier, Nemirovsky, Mario, Valero Cortés, Mateo, Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Universitat Politècnica de Catalunya. CAP - Grup de Computació d'Altes Prestacions, Radojković, Petar, Cakarevic, Vladimir, Verdú Mulà, Javier, Pajuelo González, Manuel Alejandro, Cazorla Almeida, Francisco Javier, Nemirovsky, Mario, and Valero Cortés, Mateo
- Abstract
© 2013 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works., The introduction of multithreaded processors comprised of a large number of cores with many shared resources makes thread scheduling, and in particular optimal assignment of running threads to processor hardware contexts to become one of the most promising ways to improve the system performance. However, finding optimal thread assignments for workloads running in state-of-the-art multicore/multithreaded processors is an NP-complete problem. In this paper, we propose BlackBox scheduler, a systematic method for thread assignment of multithreaded network applications running on multicore/multithreaded processors. The method requires minimum information about the target processor architecture and no data about the hardware requirements of the applications under study. The proposed method is evaluated with an industrial case study for a set of multithreaded network applications running on the UltraSPARC T2 processor. In most of the experiments, the proposed thread assignment method detected the best actual thread assignment in the evaluation sample. The method improved the system performance from 5 to 48 percent with respect to load balancing algorithms used in state-of-the-art OSs, and up to 60 percent with respect to a naive thread assignment., Peer Reviewed, Postprint (author's final draft)
- Published
- 2013
42. Communication bottelneck analysis on big data applications
- Author
-
Nemirovsky, Mario, Solé Pareta, Josep, Roca Marí, Damián, Nemirovsky, Mario, Solé Pareta, Josep, and Roca Marí, Damián
- Abstract
[ANGLÈS] Computers, and multicore processors in specific, need cache memory to improve memory bandwidth and overall performance. There are different types of cache (private and shared) divided into different levels of hierarchy. Keeping coherence and consistency of shared values in these caches is a major performance bottleneck on multicore systems. To address this issue, there are several protocols that invalidate or update these values when a core needs to modify them. But these protocols require broadcast communication (or similar) that in today NoCs represents a big cost in terms of cycles. In order to improve this bottleneck, the first step in this research is to know and have an approximation of the target that represents these invalidations in the terms of performance of the system. To obtain that estimation is mandatory to use programs or simulators of a real process inside a multicore/multithreaded processor to visualize the communications between these cores and the effects of sharing a part of the space address. The reason is that these invalidations are produced by keeping the coherence between different copies of the same variable (shared space). Once that we have a simulator that allows us to see the communications we can make different configurations to emulate a real processor in different scenarios. With these cases, we can obtain how the number of invalidations is modified depending on the parameters of the system (number of cores, size of cache memories, etc) and the applications which are running. Due to this, the results can vary for different applications since each of them uses the shared memory space in a different way. With this information we can elaborate some statistics to extract the first conclusions and fix the bases for future work. These results also enables us to study the scalability of the actual models to see what would happen if we have more than 1000-core processor because the actual simulators do not support such high number o, [CASTELLÀ] Los chips multicore conforman la realidad en el campo de los computadores. Pero dichos sistemas presentan muchos problemas que restringen su potencial. En este proyecto se realiza un estudio del principal, el sistema de memoria y mas concretamente, la memoria cache. Se estudia la escalibilidad que presentan las soluciones actuales con respecto al número de cores y se extraen las conclusiones pertinentes., [CATALÀ] La tendència actual quant a processadors consisteix a integrar múltiples cores dins d'un mateix xip. Són coneguts com a xips multicore (CMP), però el seu diseny està ple de problemes. En aquest projecte s'estudien, centrant-se en el sistema de memoria i més concretament en la memòria cau. En cocnret, s'analitza el funcionament dels protocols de coherència i la seva escalabilitat respecte el nombre de cores. Finalment, s'extreuen les conclusions que les solucions actuals no serveixen per a un nombre de cores elevat.
- Published
- 2013
43. Area and laser power scalability analysis in photonic networks-on-chip
- Author
-
Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Universitat Politècnica de Catalunya. Departament de Teoria del Senyal i Comunicacions, Universitat Politècnica de Catalunya. Departament d'Enginyeria Electrònica, Universitat Politècnica de Catalunya. CBA - Sistemes de Comunicacions i Arquitectures de Banda Ampla, Universitat Politècnica de Catalunya. GCO - Grup de Comunicacions Òptiques, Universitat Politècnica de Catalunya. EPIC - Energy Processing and Integrated Circuits, Abadal Cavallé, Sergi, Cabellos Aparicio, Alberto, Lázaro Villa, José Antonio, Nemirovsky, Mario, Alarcón Cot, Eduardo José, Solé Pareta, Josep, Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Universitat Politècnica de Catalunya. Departament de Teoria del Senyal i Comunicacions, Universitat Politècnica de Catalunya. Departament d'Enginyeria Electrònica, Universitat Politècnica de Catalunya. CBA - Sistemes de Comunicacions i Arquitectures de Banda Ampla, Universitat Politècnica de Catalunya. GCO - Grup de Comunicacions Òptiques, Universitat Politècnica de Catalunya. EPIC - Energy Processing and Integrated Circuits, Abadal Cavallé, Sergi, Cabellos Aparicio, Alberto, Lázaro Villa, José Antonio, Nemirovsky, Mario, Alarcón Cot, Eduardo José, and Solé Pareta, Josep
- Abstract
In the last decade, the field of microprocessor architecture has seen the rise of multicore processors, which consist of the interconnection of a set of independent processing units or cores in the same chip. As the number of cores per multiprocessor increases, the bandwidth and energy requirements for their interconnection networks grow exponentially and it is expected that conventional on-chip wires will not be able to meet such demands. Alternatively, nanophotonics has been regarded as a strong candidate for chip communication since it could provide high bandwidth with low area and energy footprints. However, issues such as the unavailability of efficient on-chip light sources or the difficulty of implementing all-optical buffering or header processing hinder the development of scalable photonic on-chip networks. In this paper, the area and laser power of several photonic on-chip network proposals is analytically modeled and its scalability is evaluated. Also, a graphene-based hybrid wireless/optical-wired approach is presented, aiming at enabling end-to-end photonic on-chip networks to scale beyond thousands of cores, Peer Reviewed, Postprint (published version)
- Published
- 2013
44. Thread assignment of multithreaded network applications in multicore/multithreaded processors
- Author
-
Radojkovic, Petar, Cakarevic, Vladimir, Verdu, Javier, Pajuelo, Alex, Cazorla, Francisco J., Nemirovsky, Mario, Valero, Mateo, Radojkovic, Petar, Cakarevic, Vladimir, Verdu, Javier, Pajuelo, Alex, Cazorla, Francisco J., Nemirovsky, Mario, and Valero, Mateo
- Abstract
The introduction of multithreaded processors comprised of a large number of cores with many shared resources makes thread scheduling, and in particular optimal assignment of running threads to processor hardware contexts to become one of the most promising ways to improve the system performance. However, finding optimal thread assignments for workloads running in state-of-the-art multicore/multithreaded processors is an NP-complete problem. In this paper, we propose BlackBox scheduler, a systematic method for thread assignment of multithreaded network applications running on multicore/multithreaded processors. The method requires minimum information about the target processor architecture and no data about the hardware requirements of the applications under study. The proposed method is evaluated with an industrial case study for a set of multithreaded network applications running on the UltraSPARC T2 processor. In most of the experiments, the proposed thread assignment method detected the best actual thread assignment in the evaluation sample. The method improved the system performance from 5 to 48 percent with respect to load balancing algorithms used in state-of-the-art OSs, and up to 60 percent with respect to a naive thread assignment. © 1990-2012 IEEE.
- Published
- 2013
45. Thread Assignment of Multithreaded Network Applications in Multicore/Multithreaded Processors
- Author
-
Radojkovic, Petar, primary, Cakarevic, Vladimir, additional, Verdu, Javier, additional, Pajuelo, Alex, additional, Cazorla, Francisco J., additional, Nemirovsky, Mario, additional, and Valero, Mateo, additional
- Published
- 2013
- Full Text
- View/download PDF
46. Graphene-enabled wireless communication for massive multicore architectures
- Author
-
Abadal, Sergi, primary, Alarcón, Eduard, additional, Cabellos-Aparicio, Albert, additional, Lemme, Max, additional, and Nemirovsky, Mario, additional
- Published
- 2013
- Full Text
- View/download PDF
47. An abstraction methodology for the evaluation of multi-core multi-threaded architectures
- Author
-
Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Universitat Politècnica de Catalunya. CNDS - Xarxes de Computadors i Sistemes Distribuïts, Universitat Politècnica de Catalunya. CAP - Grup de Computació d'Altes Prestacions, Zilan, Ruken, Verdú Mulà, Javier, García Vidal, Jorge, Nemirovsky, Mario, Milito, Rodolfo, Valero Cortés, Mateo, Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Universitat Politècnica de Catalunya. CNDS - Xarxes de Computadors i Sistemes Distribuïts, Universitat Politècnica de Catalunya. CAP - Grup de Computació d'Altes Prestacions, Zilan, Ruken, Verdú Mulà, Javier, García Vidal, Jorge, Nemirovsky, Mario, Milito, Rodolfo, and Valero Cortés, Mateo
- Abstract
As the evolution of multi-core multi-threaded processors continues, the complexity demanded to perform an extensive trade-off analysis, increases proportionally. Cycle-accurate or trace-driven simulators are too slow to execute the large amount of experiments required to obtain indicative results. To achieve a thorough analysis of the system, software benchmarks or traces are required. In many cases when an analysis is needed most, during the earlier stages of the processor design, benchmarks or traces are not available. Analytical models overcome these limitations but do not provide the fine grain details needed for a deep analysis of these architectures. In this work we present a new methodology to abstract processor architectures, at a level between cycle-accurate and analytical simulators. To apply our methodology we use queueing modeling techniques. Thus, we introduce Q-MAS, a queueing based tool targeting a real chip (the Ultra SPARC T2 processor) and aimed at facilitating the quantification of trade-offs during the design phase of multi-core multi-threaded processor architectures. The results demonstrate that Q-MAS, the tool that we developed, provides accurate results very close to the actual hardware, with a minimal cost of running what-if scenarios., Peer Reviewed, Postprint (published version)
- Published
- 2011
48. Thread to strand binding of parallel network applications in massive multi-threaded systems
- Author
-
Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Universitat Politècnica de Catalunya. CAP - Grup de Computació d'Altes Prestacions, Radojković, Petar, Cakarevic, Vladimir, Verdú Mulà, Javier, Pajuelo González, Manuel Alejandro, Cazorla Almeida, Francisco Javier, Nemirovsky, Mario, Valero Cortés, Mateo, Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Universitat Politècnica de Catalunya. CAP - Grup de Computació d'Altes Prestacions, Radojković, Petar, Cakarevic, Vladimir, Verdú Mulà, Javier, Pajuelo González, Manuel Alejandro, Cazorla Almeida, Francisco Javier, Nemirovsky, Mario, and Valero Cortés, Mateo
- Abstract
In processors with several levels of hardware resource sharing, like CMPs in which each core is an SMT, the scheduling process becomes more complex than in processors with a single level of resource sharing, such as pure-SMT or pure-CMP processors. Once the operating system selects the set of applications to simultaneously schedule on the processor (workload), each application/ thread must be assigned to one of the hardware contexts (strands). We call this last scheduling step the Thread to Strand Binding or TSB. In this paper, we show that the TSB impact on the performance of processors with several levels of shared resources is high. We measure a variation of up to 59% between different TSBs of real multithreaded network applications running on the UltraSPARC T2 processor which has three levels of resource sharing. In our view, this problem is going to be more acute in future multithreaded architectures comprising more cores, more contexts per core, and more levels of resource sharing. We propose a resource-sharing aware TSB algorithm (TSBSched) that significantly facilitates the problem of thread to strand binding for software-pipelined applications, representative ofmultithreaded network applications. Our systematic approach encapsulates both, the characteristics of multithreaded processors under the study and the structure of the software pipelined applications. Once calibrated for a given processor architecture, our proposal does not require hardware knowledge on the side of the programmer, nor extensive profiling of the application. We validate our algorithm on the UltraSPARC T2 processor running a set of real multithreaded network applications on which we report improvements of up to 46% compared to the current state-of-the-art dynamic schedulers., Peer Reviewed, Postprint (published version)
- Published
- 2010
49. Internet traffic and the behavior of processing workloads
- Author
-
Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Universitat Politècnica de Catalunya. CNDS - Xarxes de Computadors i Sistemes Distribuïts, Universitat Politècnica de Catalunya. CAP - Grup de Computació d'Altes Prestacions, Zilan, Ruken, Verdú Mulà, Javier, García Vidal, Jorge, Nemirovsky, Mario, Valero Cortés, Mateo, Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Universitat Politècnica de Catalunya. CNDS - Xarxes de Computadors i Sistemes Distribuïts, Universitat Politècnica de Catalunya. CAP - Grup de Computació d'Altes Prestacions, Zilan, Ruken, Verdú Mulà, Javier, García Vidal, Jorge, Nemirovsky, Mario, and Valero Cortés, Mateo
- Abstract
Nowadays, the evolution of network services provided at the edge of Internet increases the requirements of network applications. Such applications result in complexities thus, the processors need to execute more complex workloads that can deal not only with the packet header, but also with the packet payload (e.g. Deep Packet Inspection). Unlike common routing applications that show similar processing among packets, next-generation of network applications present variations in the processing procedure among packets. Thus, different traffic behaviors can produce different process patterns and present different memory and processing requirements. The aim of this work is to present an ongoing work towards correlating Internet traffic features with variations of processing workloads on the next-generation of edge routers., Peer Reviewed, Postprint (published version)
- Published
- 2009
50. Measuring operating system overhead on Sun UltraSparc T1 processor
- Author
-
Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Universitat Politècnica de Catalunya. CAP - Grup de Computació d'Altes Prestacions, Radojković, Petar, Cakarevic, Vladimir, Verdú Mulà, Javier, Pajuelo González, Manuel Alejandro, Gioiosa, Roberto, Cazorla Almeida, Francisco Javier, Nemirovsky, Mario, Valero Cortés, Mateo, Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Universitat Politècnica de Catalunya. CAP - Grup de Computació d'Altes Prestacions, Radojković, Petar, Cakarevic, Vladimir, Verdú Mulà, Javier, Pajuelo González, Manuel Alejandro, Gioiosa, Roberto, Cazorla Almeida, Francisco Javier, Nemirovsky, Mario, and Valero Cortés, Mateo
- Abstract
Numerous studies have shown that Operating System (OS) noise is one of the reasons for significant performance degradation in clustered architectures. Although many studies examine the OS noise for High Performance Computing, especially in multi-processor/core systems, most of them focus on 2- or 4-core systems. In this study, we analyze sources of OS noise on a massive multithreading processor, the Sun UltraSPARC T1.We compare results, measured in Linux and Solaris, with the results provided by a low-overhead runtime environment that introduces almost no overhead in applications’ execution time. Our results show that the overhead introduced by the OS timer interrupt in Linux and Solaris depends on the particular core and hardware context in which the application is running. This overhead is up to 30% when the application is executed on the same hardware context as the timer interrupt handler, and up to 10% when the application and the timer interrupt handler run on different contexts but on the same core. We detect no overhead when the benchmark and the timer interrupt handler run on different cores of the processor., Postprint (published version)
- Published
- 2009
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.