183 results on '"High-Performance Computing (HPC)"'
Search Results
152. Multi-GPU based 3D numerical modeling of fluid migration and clay dehydration influence on Lusi hydrothermal activity (Java, Indonesia).
- Author
-
Sohrabi, Reza, Malvoisin, Benjamin, Mazzini, Adriano, and Miller, Stephen A.
- Subjects
- *
HYDROTHERMAL deposits , *GRAPHICS processing units , *CLAY , *DEHYDRATION , *GEOTHERMAL resources , *SEISMIC waves - Abstract
The Lusi mud eruption in East Java has been active since May 2006. Magma emplacement at depth, clay dehydration, and mud liquefaction during seismic wave propagation have been invoked as mechanisms fueling this eruption. However, the respective roles of these processes are still poorly constrained. In this focused study, we numerically investigate the influence of clay dehydration, mass and heat transport on fluid outflow at the Lusi site using a fully coupled 3D model for this active system. Using a multi-GPU parallel processing algorithm, we propose an estimate of the 3D time evolution of pressure, temperature, porosity, permeability and water liberation in a large-scale (9 km - 14 km - 5.5 km) deep hydrothermal system at high-resolution. Simulations indicate that high-pressure fluids generated by dehydration reactions are sufficient to induce hydro-fractures that would significantly influence the porosity and permeability structures. Dehydration is an essential component for understanding the Lusi system, because the fluids generated contribute to the outflow and may have a considerable impact for the maintenance of the infrastructure required to keep the Lusi site safe. High-Performance Computing (HPC) offers high-resolution simulations for studying time evolution of such natural systems, and potentially for geothermal resource development for the surrounding population. • Couplings between fluid flow, heat transport and reactions in the Lusi system solved with multi-GPU technology • High-Performance Computing with 3D numerical models using multi-GPU processing • Fluid pressure build-up due to clay dehydration controls hydrothermal system fluid outflow [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
153. Knowledge Structure of the Application of High-Performance Computing: A Co-Word Analysis.
- Author
-
Lee, Kiwon and Lee, Suchul
- Abstract
As high-performance computing (HPC) plays a key role in the Fourth Industrial Revolution, the application of HPC in various industries is becoming increasingly important. Several studies have reviewed the research trends of HPC but considered only the functional aspects, causing limitations when discussing the application. Thus, this study aims to identify the knowledge structure of the application of HPC, enabling practical and policy support in various industrial fields. Co-word analysis is mainly used to establish the knowledge structure. We first collected 28,941 published papers related to HPC applications and built a co-word network that used author keywords. We performed centrality analysis and cluster analysis of the co-word network; as a result, we derived the major keywords and 18 areas of HPC applications. To validate the knowledge structure, we conducted a case study to find opportunities for HPC research plans in the research community. As a result, we discovered 17 new research topics and presented their research priorities by conducting expert interviews and Analytic Hierarchy Process. The findings of this study contribute to an understanding of the application of HPC, to exploring promising research fields for technological and social development, and to supporting research plans for successful technology commercialization. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
154. Implementación y evaluación de un 'particle mover' usando distintas precisiones en operaciones de coma flotante
- Author
-
Palacios Márquez, Francisco José
- Subjects
Computer and Information Sciences ,Computación de altas prestaciones (HPC) ,Precisión mixta ,Mixed precision ,Grado en Ingeniería Informática-Grau en Enginyeria Informàtica ,GPU ,CIENCIAS DE LA COMPUTACION E INTELIGENCIA ARTIFICIAL ,IPIC3D ,Data- och informationsvetenskap ,Simulación ,High-performance computing (HPC) ,Simulation - Abstract
[EN] Computer simulations are broadly used nowadays in order to obtain information that would be impossible to gain otherwise. These computational workloads have grown in size to the point where even a small improvement in its implementation can lead to a substantial speed-up. For this reason, researchers have studied the impact linked to the usage of less precise numbers, since it would accelerate calculations. Likewise, this study aims to determine the feasibility of using mixed-precision arithmetic in iPIC3D, a 3D implicit Particle-in-cell (PIC) implementation used in plasma simulations. Specifically, the use of mixed-precision numbers will be limited to the particle mover, the section that solves the equations of motion for each particle of the plasma, resulting in the most time-consuming part of the code. The results show a maximum divergence or error of about 2% between the original implementation and the new one when comparing the output values. All of this, performing a relatively short simulation of 2250 cycles, therefore, with longer tasks we the error could increase. Thus, we come to the conclusion that in most cases, the loss in precision is too high to justify the use of this new implementation., [ES] Las simulaciones por ordenador son ampliamente usadas a dia de hoy para obtener información que sería imposible de conseguir de otra forma. Estas simulaciones se han hecho cada vez más grandes y complejas y cualquier mejora en el coste temporal de las mismas es algo muy importante. Así, investigadores de todo el mundo están intentando crear nuevos algoritmos y mejorar los que ya existen. Uno de los campos que se están desarrollando para intentar mejorar algoritmos existentes es el que estudia el efecto que tiene el uso de operaciones de coma flotante de menor precisión, puesto que estas operaciones son más rápidas. Asímismo, este estudio busca cuantificar el impacto que conlleva el uso de operaciones de simple precisión en iPIC3D, una implementación implícita y 3D del método "Particle-in-cell" (PIC), permitiendo llevar a cabo simulaciones de plasma. El uso de números con simple precisión se limitará al "particle mover", la sección del codigo que resuelve las ecuaciones de movimiento para cada partícula de plasma, siendo la parte del código más costosa a nivel de tiempo. Los resultados obtenidos muestran un máximo de divergencia del 2% entre la implementación original y la nueva cuando se comparan los valores finales. Todo ello, realizando una simulación de 2250 ciclos, lo cual es relativamente poco y, por lo tanto, con tareas que requieran un numero elevado de ciclos este error podría aumentar. Por todo esto, concluimos que en la mayoría de casos, lá pérdida de precisión es demasiado alta como para justificar el uso de la nueva implmementación.
- Published
- 2019
155. Venturing into the Heart of High-Performance Computing Simultations.
- Author
-
Heller, Arnie
- Subjects
HEART beat ,COMPUTER simulation ,CARDIOID ,ELECTRIC properties of heart cells ,SUPERCOMPUTERS - Abstract
The article focuses on the study conducted by the researchers from the Lawrence Livermore National Laboratory on high-performance computing (HPC) simulations that will mimic the beating of a human heart in real-time. It says that the new simulation was made possible by a high-scalable Cardioid code that replicates the human heart's electrophysiology. It also mentions that the heart simulation was performed and developed at the Sequoia supercomputer at the laboratory. INSET: Expanding Laboratory Collaborations... as Well.
- Published
- 2012
156. The LS-STAG immersed boundary/cut-cell method for non-Newtonian flows in 3D extruded geometries
- Author
-
Yoann Cheny, Farhad Nikfarjam, Olivier Botella, Laboratoire Énergies et Mécanique Théorique et Appliquée (LEMTA ), and Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)
- Subjects
Discretization ,Non-Newtonian fluids ,Rotational symmetry ,General Physics and Astronomy ,Computational fluid dynamics ,01 natural sciences ,010305 fluids & plasmas ,Physics::Fluid Dynamics ,0103 physical sciences ,Newtonian fluid ,[PHYS.MECA.MEFL]Physics [physics]/Mechanics [physics]/Fluid mechanics [physics.class-ph] ,0101 mathematics ,Immersed boundary methods ,Physics ,business.industry ,Turbulence ,Numerical analysis ,Laminar flow ,Mechanics ,Non-Newtonian fluid ,010101 applied mathematics ,Hardware and Architecture ,Computational fluid dynamics (CFD) ,business ,Cut-cell methods ,High-performance computing (HPC) ,[MATH.MATH-NA]Mathematics [math]/Numerical Analysis [math.NA] - Abstract
The LS-STAG method is an immersed boundary/cut-cell method for viscous incompressible flows based on the staggered MAC arrangement for Cartesian grids, where the irregular boundary is sharply represented by its level-set function , results in a significant gain in computer resources (wall time, memory usage) compared to commercial body-fitted CFD codes. The 2D version of LS-STAG method is now well-established (Cheny and Botella, 2010), and this paper presents its extension to 3D geometries with translational symmetry in the z direction (hereinafter called 3D extruded configurations). This intermediate step towards the fully 3D implementation can be applied to a wide variety of canonical flows and will be regarded as the keystone for the full 3D solver, since both discretization and implementation issues on distributed memory machines are tackled at this stage of development. The LS-STAG method is then applied to various Newtonian and non-Newtonian flows in 3D extruded geometries (axisymmetric pipe, circular cylinder, duct with an abrupt expansion) for which benchmark results and experimental data are available. The purpose of these investigations are (a) to investigate the formal order of accuracy of the LS-STAG method, (b) to assess the versatility of method for flow applications at various regimes (Newtonian and shear-thinning fluids, steady and unsteady laminar to turbulent flows) (c) to compare its performance with well-established numerical methods (body-fitted and immersed boundary methods).
- Published
- 2018
- Full Text
- View/download PDF
157. A Massively Parallel Hybrid Finite Volume/Finite Element Scheme for Computational Fluid Dynamics.
- Author
-
Río-Martín, Laura, Busto, Saray, and Dumbser, Michael
- Subjects
- *
COMPUTATIONAL fluid dynamics , *SHALLOW-water equations , *MACH number , *NAVIER-Stokes equations , *CONJUGATE gradient methods , *INCOMPRESSIBLE flow , *RAYLEIGH-Benard convection , *COMPRESSIBLE flow - Abstract
In this paper, we propose a novel family of semi-implicit hybrid finite volume/finite element schemes for computational fluid dynamics (CFD), in particular for the approximate solution of the incompressible and compressible Navier-Stokes equations, as well as for the shallow water equations on staggered unstructured meshes in two and three space dimensions. The key features of the method are the use of an edge-based/face-based staggered dual mesh for the discretization of the nonlinear convective terms at the aid of explicit high resolution Godunov-type finite volume schemes, while pressure terms are discretized implicitly using classical continuous Lagrange finite elements on the primal simplex mesh. The resulting pressure system is symmetric positive definite and can thus be very efficiently solved at the aid of classical Krylov subspace methods, such as a matrix-free conjugate gradient method. For the compressible Navier-Stokes equations, the schemes are by construction asymptotic preserving in the low Mach number limit of the equations, hence a consistent hybrid FV/FE method for the incompressible equations is retrieved. All parts of the algorithm can be efficiently parallelized, i.e., the explicit finite volume step as well as the matrix-vector product in the implicit pressure solver. Concerning parallel implementation, we employ the Message-Passing Interface (MPI) standard in combination with spatial domain decomposition based on the free software package METIS. To show the versatility of the proposed schemes, we present a wide range of applications, starting from environmental and geophysical flows, such as dambreak problems and natural convection, over direct numerical simulations of turbulent incompressible flows to high Mach number compressible flows with shock waves. An excellent agreement with exact analytical, numerical or experimental reference solutions is achieved in all cases. Most of the simulations are run with millions of degrees of freedom on thousands of CPU cores. We show strong scaling results for the hybrid FV/FE scheme applied to the 3D incompressible Navier-Stokes equations, using millions of degrees of freedom and up to 4096 CPU cores. The largest simulation shown in this paper is the well-known 3D Taylor-Green vortex benchmark run on 671 million tetrahedral elements on 32,768 CPU cores, showing clearly the suitability of the presented algorithm for the solution of large CFD problems on modern massively parallel distributed memory supercomputers. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
158. NUMA-Aware DGEMM Based on 64-Bit ARMv8 Multicore Processors Architecture.
- Author
-
Zhang, Wei, Jiang, Zihao, Chen, Zhiguang, Xiao, Nong, and Ou, Yang
- Subjects
MULTICORE processors ,MATRIX multiplications ,ENERGY consumption ,SCALABILITY - Abstract
Double-precision general matrix multiplication (DGEMM) is an essential kernel for measuring the potential performance of an HPC platform. ARMv8-based system-on-chips (SoCs) have become the candidates for the next-generation HPC systems with their highly competitive performance and energy efficiency. Therefore, it is meaningful to design high-performance DGEMM for ARMv8-based SoCs. However, as ARMv8-based SoCs integrate increasing cores, modern CPU uses non-uniform memory access (NUMA). NUMA restricts the performance and scalability of DGEMM when many threads access remote NUMA domains. This poses a challenge to develop high-performance DGEMM on multi-NUMA architecture. We present a NUMA-aware method to reduce the number of cross-die and cross-chip memory access events. The critical enabler for NUMA-aware DGEMM is to leverage two levels of parallelism between and within nodes in a purely threaded implementation, which allows the task independence and data localization of NUMA nodes. We have implemented NUMA-aware DGEMM in the OpenBLAS and evaluated it on a dual-socket server with 48-core processors based on the Kunpeng920 architecture. The results show that NUMA-aware DGEMM has effectively reduced the number of cross-die and cross-chip memory access, resulting in enhancing the scalability of DGEMM significantly and increasing the performance of DGEMM by 17.1% on average, with the most remarkable improvement being 21.9%. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
159. Extension de la méthode LS-STAG de type frontière immergée/cut-cell aux géométries 3D extrudées : applications aux écoulements newtoniens et non newtoniens
- Author
-
Nikfarjam, Farhad, Laboratoire Énergies et Mécanique Théorique et Appliquée (LEMTA ), Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS), Université de Lorraine, Salaheddine Skali-Lami, and Olivier Botella
- Subjects
Computational Fluid Dynamics (CFD) ,Fluides non-newtoniens ,Calcul à haute performance ,Non-Newtonian fluids ,Méthode cut-cell ,Méthode frontière immergée ,Immersed boundary methods ,Cut-cell methods ,High-Performance Computing (HPC) ,Mécanique des Fluides Numérique (MFN) ,[SPI.MECA.MEFL]Engineering Sciences [physics]/Mechanics [physics.med-ph]/Fluids mechanics [physics.class-ph] - Abstract
The LS-STAG method is an immersed boundary/cut-cell method for viscous incompressible flows based on the staggered MAC arrangement for Cartesian grids where the irregular boundary is sharply represented by its level-set function. This approach results in a significant gain in computer resources compared to commercial body-fitted CFD codes. The 2D version of LS-STAG method is now well-established and this manuscript presents its extension to 3D geometries with translational symmetry in the z direction (3D extruded configurations). This intermediate step will be regarded as the milestone for the full 3D solver, since both discretization and implementation issues on distributed memory machines are tackled at this stage of development. The LS-STAG method is then applied to Newtonian and non-Newtonian flows in 3D extruded geometries (axisymmetric pipe, circular cylinder, duct with an abrupt expansion, etc.) for which benchmark results and experimental data are available. The purpose of these investigations is to evaluate the accuracy of LS-STAG method, to assess the versatility of method for flow applications at various regimes (Newtonian and shear-thinning fluids, steady and unsteady laminar to turbulent flows, granular flows) and to compare its performance with well-established numerical methods (body-fitted and immersed boundary methods); La méthode LS-STAG est une méthode de type frontière immergée/cut-cell pour le calcul d’écoulements visqueux incompressibles qui est basée sur la méthode MAC pour grilles cartésiennes décalées, où la frontière irrégulière est nettement représentée par sa fonction level-set, résultant en un gain significatif en ressources informatiques par rapport aux codes MFN commerciaux utilisant des maillages qui épousent la géométrie. La version 2D est maintenant bien établie et ce manuscrit présente son extension aux géométries 3D avec une symétrie translationnelle dans la direction z (configurations extrudées 3D). Cette étape intermédiaire sera considérée comme la clé de voûte du solveur 3D complet, puisque les problèmes de discrétisation et d’implémentation sur les machines à mémoire distribuée sont abordés à ce stade de développement. La méthode LS-STAG est ensuite appliquée à divers écoulements newtoniens et non-newtoniens dans des géométries extrudées 3D (conduite axisymétrique, cylindre circulaire, conduite cylindrique avec élargissement brusque, etc.) pour lesquels des résultats de références et des données expérimentales sont disponibles. Le but de ces investigations est d’évaluer la précision de la méthode LS-STAG, d’évaluer la polyvalence de la méthode pour les applications d’écoulement dans différents régimes (fluides newtoniens et rhéofluidifiants, écoulement laminaires stationnaires et instationnaires, écoulements granulaires) et de comparer ses performances avec de méthodes numériques bien établies (méthodes non structurées et de frontières immergées)
- Published
- 2018
160. Extension of the LS-STAG immersed boundary/cut-cell method to 3D extruded geometries : Application to Newtonian and non-Newtonian flows
- Author
-
Nikfarjam, Farhad, Laboratoire Énergies et Mécanique Théorique et Appliquée (LEMTA ), Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS), Université de Lorraine, Salaheddine Skali-Lami, Olivier Botella, and STAR, ABES
- Subjects
Computational Fluid Dynamics (CFD) ,Fluides non-newtoniens ,Calcul à haute performance ,Non-Newtonian fluids ,[SPI.MECA.MEFL] Engineering Sciences [physics]/Mechanics [physics.med-ph]/Fluids mechanics [physics.class-ph] ,Méthode cut-cell ,Méthode frontière immergée ,Immersed boundary methods ,Cut-cell methods ,High-Performance Computing (HPC) ,Mécanique des Fluides Numérique (MFN) ,[SPI.MECA.MEFL]Engineering Sciences [physics]/Mechanics [physics.med-ph]/Fluids mechanics [physics.class-ph] - Abstract
The LS-STAG method is an immersed boundary/cut-cell method for viscous incompressible flows based on the staggered MAC arrangement for Cartesian grids where the irregular boundary is sharply represented by its level-set function. This approach results in a significant gain in computer resources compared to commercial body-fitted CFD codes. The 2D version of LS-STAG method is now well-established and this manuscript presents its extension to 3D geometries with translational symmetry in the z direction (3D extruded configurations). This intermediate step will be regarded as the milestone for the full 3D solver, since both discretization and implementation issues on distributed memory machines are tackled at this stage of development. The LS-STAG method is then applied to Newtonian and non-Newtonian flows in 3D extruded geometries (axisymmetric pipe, circular cylinder, duct with an abrupt expansion, etc.) for which benchmark results and experimental data are available. The purpose of these investigations is to evaluate the accuracy of LS-STAG method, to assess the versatility of method for flow applications at various regimes (Newtonian and shear-thinning fluids, steady and unsteady laminar to turbulent flows, granular flows) and to compare its performance with well-established numerical methods (body-fitted and immersed boundary methods), La méthode LS-STAG est une méthode de type frontière immergée/cut-cell pour le calcul d’écoulements visqueux incompressibles qui est basée sur la méthode MAC pour grilles cartésiennes décalées, où la frontière irrégulière est nettement représentée par sa fonction level-set, résultant en un gain significatif en ressources informatiques par rapport aux codes MFN commerciaux utilisant des maillages qui épousent la géométrie. La version 2D est maintenant bien établie et ce manuscrit présente son extension aux géométries 3D avec une symétrie translationnelle dans la direction z (configurations extrudées 3D). Cette étape intermédiaire sera considérée comme la clé de voûte du solveur 3D complet, puisque les problèmes de discrétisation et d’implémentation sur les machines à mémoire distribuée sont abordés à ce stade de développement. La méthode LS-STAG est ensuite appliquée à divers écoulements newtoniens et non-newtoniens dans des géométries extrudées 3D (conduite axisymétrique, cylindre circulaire, conduite cylindrique avec élargissement brusque, etc.) pour lesquels des résultats de références et des données expérimentales sont disponibles. Le but de ces investigations est d’évaluer la précision de la méthode LS-STAG, d’évaluer la polyvalence de la méthode pour les applications d’écoulement dans différents régimes (fluides newtoniens et rhéofluidifiants, écoulement laminaires stationnaires et instationnaires, écoulements granulaires) et de comparer ses performances avec de méthodes numériques bien établies (méthodes non structurées et de frontières immergées)
- Published
- 2018
161. Exploration d’algorithmes de traitement parallèle de graphes sur architectures distribuées
- Author
-
Collet, Julien, Heuristique et Diagnostic des Systèmes Complexes [Compiègne] (Heudiasyc), Université de Technologie de Compiègne (UTC)-Centre National de la Recherche Scientifique (CNRS), Université de Technologie de Compiègne, Jacques Carlier, and Renaud Sirdey
- Subjects
Big Data ,[SPI.OTHER]Engineering Sciences [physics]/Other ,GraphLab ,High-performance data analytics ,Parallel and distributed computing ,[INFO.INFO-OH]Computer Science [cs]/Other [cs.OH] ,Programming models ,Modèles de programmation ,Performance monitoring ,High-performance computing (HPC) ,Graph-processing - Abstract
With the advent of ever-increasing graph datasets in a large number of domains, parallel graph-processing applications deployed on distributed architectures are more and more needed to cope with the growing demand for memory and compute resources. Though large-scale distributed architectures are available, notably in the High-Performance Computing (HPC) domain, the programming and deployment complexity of such graphprocessing algorithms, whose parallelization and complexity are highly data-dependent, hamper usability. Moreover, the difficult evaluation of performance behaviors of these applications complexifies the assessment of the relevance of the used architecture. With this in mind, this thesis work deals with the exploration of graph-processing algorithms on distributed architectures, notably using GraphLab, a state of the art graphprocessing framework. Two use-cases are considered. For each, a parallel implementation is proposed and deployed on several distributed architectures of varying scales. This study highlights operating ranges, which can eventually be leveraged to appropriately select a relevant operating point with respect to the datasets processed and used cluster nodes. Further study enables a performance comparison of commodity cluster architectures and higher-end compute servers using the two use-cases previously developed. This study highlights the particular relevance of using clustered commodity workstations, which are considerably cheaper and simpler with respect to node architecture, over higher-end systems in this applicative context. Then, this thesis work explores how performance studies are helpful in cluster design for graph-processing. In particular, studying throughput performances of a graph-processing system gives fruitful insights for further node architecture improvements. Moreover, this work shows that a more in-depth performance analysis can lead to guidelines for the appropriate sizing of a cluster for a given workload, paving the way toward resource allocation for graph-processing. Finally, hardware improvements for next generations of graph-processing servers areproposed and evaluated. A flash-based victim-swap mechanism is proposed for the mitigation of unwanted overloaded operations. Then, the relevance of ARM-based microservers for graph-processing is investigated with a port of GraphLab on a NVIDIA TX2-based architecture.; Avec l'explosion du volume de données produites chaque année, les applications du domaine du traitement de graphes ont de plus en plus besoin d'être parallélisées et déployées sur des architectures distribuées afin d'adresser le besoin en mémoire et en ressource de calcul. Si de telles architectures larges échelles existent, issue notamment du domaine du calcul haute performance (HPC), la complexité de programmation et de déploiement d’algorithmes de traitement de graphes sur de telles cibles est souvent un frein à leur utilisation. De plus, la difficile compréhension, a priori, du comportement en performances de ce type d'applications complexifie également l'évaluation du niveau d'adéquation des architectures matérielles avec de tels algorithmes. Dans ce contexte, ces travaux de thèses portent sur l’exploration d’algorithmes de traitement de graphes sur architectures distribuées en utilisant GraphLab, un Framework de l’état de l’art dédié à la programmation parallèle de tels algorithmes. En particulier, deux cas d'applications réelles ont été étudiées en détails et déployées sur différentes architectures à mémoire distribuée, l’un venant de l’analyse de trace d’exécution et l’autre du domaine du traitement de données génomiques. Ces études ont permis de mettre en évidence l’existence de régimes de fonctionnement permettant d'identifier des points de fonctionnements pertinents dans lesquels on souhaitera placer un système pour maximiser son efficacité. Dans un deuxième temps, une étude a permis de comparer l'efficacité d'architectures généralistes (type commodity cluster) et d'architectures plus spécialisées (type serveur de calcul hautes performances) pour le traitement de graphes distribué. Cette étude a démontré que les architectures composées de grappes de machines de type workstation, moins onéreuses et plus simples, permettaient d'obtenir des performances plus élevées. Cet écart est d'avantage accentué quand les performances sont pondérées par les coûts d'achats et opérationnels. L'étude du comportement en performance de ces architectures a également permis de proposer in fine des règles de dimensionnement et de conception des architectures distribuées, dans ce contexte. En particulier, nous montrons comment l’étude des performances fait apparaitre les axes d’amélioration du matériel et comment il est possible de dimensionner un cluster pour traiter efficacement une instance donnée. Finalement, des propositions matérielles pour la conception de serveurs de calculs plus performants pour le traitement de graphes sont formulées. Premièrement, un mécanisme est proposé afin de tempérer la baisse significative de performance observée quand le cluster opère dans un point de fonctionnement où la mémoire vive est saturée. Enfin, les deux applications développées ont été évaluées sur une architecture à base de processeurs basse-consommation afin d'étudier la pertinence de telles architectures pour le traitement de graphes. Les performances mesurés en utilisant de telles plateformes sont encourageantes et montrent en particulier que la diminution des performances brutes par rapport aux architectures existantes est compensée par une efficacité énergétique bien supérieure.
- Published
- 2017
162. Evaluation of Criteria for the Implementation of High-Performance Computing (HPC) in Danube Region Countries Using Fuzzy PIPRECIA Method.
- Author
-
Tomašević, Milovan, Lapuh, Lucija, Stević, Željko, Stanujkić, Dragiša, and Karabašević, Darjan
- Abstract
The use of computers with outstanding performance has become a real necessity in order to achieve greater efficiency and sustainability for the accomplishment of various tasks. Therefore, with the development of information technology and increasing dynamism in the business environment, it is expected that these computers will be more intensively deployed. In this paper, research was conducted in Danube region countries: Austria, Bosnia and Herzegovina, Bulgaria, Croatia, Czech Republic, Germany, Hungary, Moldova, Montenegro, Romania, Serbia, Slovakia, Slovenia, and Ukraine. The aim of the research was to determine what criteria are most significant for the introduction of high-performance computing and the real situation in each of the countries. In addition, the aim was to establish the infrastructure needed to implement such a system. In order to determine the partial significance of each criterion and thus the possibility of implementing high-performance computing, a multi-criteria model in a fuzzy environment was applied. The weights of criteria and their rankings were performed using the Fuzzy PIvot Pairwise RElative Criteria Importance Assessment—fuzzy PIPRECIA method. The results indicate different values depend on decision-makers (DMs) in the countries. Spearman's and Pearson's correlation coefficients were calculated to verify the results obtained. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
163. High throughput software-based gradient tree boosting positioning for PET systems.
- Author
-
Wassermann C, Mueller F, Dey T, Lambertus J, Schug D, Schulz V, and Miller J
- Subjects
- Positron-Emission Tomography, Software
- Abstract
The supervised machine learning technique Gradient Tree Boosting (GTB) has shown good accuracy for position estimation of gamma interaction in PET crystals for bench-top experiments while its computational requirements can easily be adjusted. Transitioning to preclinical and clinical applications requires near real-time processing in the scale of full PET systems. In this work, a high throughput GTB-based singles positioning C++ implementation is proposed and a series of optimizations are evaluated regarding their effect on the achievable processing throughput. Moreover, the crucial feature and parameter selection for GTB is investigated for the segmented detectors of the Hyperion II
D PET insert with two main models and a range of GTB hyperparameters. The proposed framework achieves singles positioning throughputs of more than 9.5 GB/s for smaller models and of 240 MB/s for more complex models on a recent Intel Skylake server. Detailed throughput analysis reveals the key performance limiting factors, and an empirical throughput model is derived to guide the GTB model selection process and scanner design decisions. The throughput model allows for throughput estimations with a mean absolute error (MAE) of 175.78 MB/s., (Creative Commons Attribution license.)- Published
- 2021
- Full Text
- View/download PDF
164. Exploration of parallel graph-processing algorithms on distributed architectures
- Author
-
Collet, Julien, STAR, ABES, Heuristique et Diagnostic des Systèmes Complexes [Compiègne] (Heudiasyc), Université de Technologie de Compiègne (UTC)-Centre National de la Recherche Scientifique (CNRS), Université de Technologie de Compiègne, Jacques Carlier, and Renaud Sirdey
- Subjects
[SPI.OTHER]Engineering Sciences [physics]/Other ,Big Data ,[INFO.INFO-OH] Computer Science [cs]/Other [cs.OH] ,High-performance data analytics ,GraphLab ,[SPI.OTHER] Engineering Sciences [physics]/Other ,Parallel and distributed computing ,[INFO.INFO-OH]Computer Science [cs]/Other [cs.OH] ,Programming models ,Modèles de programmation ,Performance monitoring ,High-performance computing (HPC) ,Graph-processing - Abstract
With the advent of ever-increasing graph datasets in a large number of domains, parallel graph-processing applications deployed on distributed architectures are more and more needed to cope with the growing demand for memory and compute resources. Though large-scale distributed architectures are available, notably in the High-Performance Computing (HPC) domain, the programming and deployment complexity of such graphprocessing algorithms, whose parallelization and complexity are highly data-dependent, hamper usability. Moreover, the difficult evaluation of performance behaviors of these applications complexifies the assessment of the relevance of the used architecture. With this in mind, this thesis work deals with the exploration of graph-processing algorithms on distributed architectures, notably using GraphLab, a state of the art graphprocessing framework. Two use-cases are considered. For each, a parallel implementation is proposed and deployed on several distributed architectures of varying scales. This study highlights operating ranges, which can eventually be leveraged to appropriately select a relevant operating point with respect to the datasets processed and used cluster nodes. Further study enables a performance comparison of commodity cluster architectures and higher-end compute servers using the two use-cases previously developed. This study highlights the particular relevance of using clustered commodity workstations, which are considerably cheaper and simpler with respect to node architecture, over higher-end systems in this applicative context. Then, this thesis work explores how performance studies are helpful in cluster design for graph-processing. In particular, studying throughput performances of a graph-processing system gives fruitful insights for further node architecture improvements. Moreover, this work shows that a more in-depth performance analysis can lead to guidelines for the appropriate sizing of a cluster for a given workload, paving the way toward resource allocation for graph-processing. Finally, hardware improvements for next generations of graph-processing servers areproposed and evaluated. A flash-based victim-swap mechanism is proposed for the mitigation of unwanted overloaded operations. Then, the relevance of ARM-based microservers for graph-processing is investigated with a port of GraphLab on a NVIDIA TX2-based architecture., Avec l'explosion du volume de données produites chaque année, les applications du domaine du traitement de graphes ont de plus en plus besoin d'être parallélisées et déployées sur des architectures distribuées afin d'adresser le besoin en mémoire et en ressource de calcul. Si de telles architectures larges échelles existent, issue notamment du domaine du calcul haute performance (HPC), la complexité de programmation et de déploiement d’algorithmes de traitement de graphes sur de telles cibles est souvent un frein à leur utilisation. De plus, la difficile compréhension, a priori, du comportement en performances de ce type d'applications complexifie également l'évaluation du niveau d'adéquation des architectures matérielles avec de tels algorithmes. Dans ce contexte, ces travaux de thèses portent sur l’exploration d’algorithmes de traitement de graphes sur architectures distribuées en utilisant GraphLab, un Framework de l’état de l’art dédié à la programmation parallèle de tels algorithmes. En particulier, deux cas d'applications réelles ont été étudiées en détails et déployées sur différentes architectures à mémoire distribuée, l’un venant de l’analyse de trace d’exécution et l’autre du domaine du traitement de données génomiques. Ces études ont permis de mettre en évidence l’existence de régimes de fonctionnement permettant d'identifier des points de fonctionnements pertinents dans lesquels on souhaitera placer un système pour maximiser son efficacité. Dans un deuxième temps, une étude a permis de comparer l'efficacité d'architectures généralistes (type commodity cluster) et d'architectures plus spécialisées (type serveur de calcul hautes performances) pour le traitement de graphes distribué. Cette étude a démontré que les architectures composées de grappes de machines de type workstation, moins onéreuses et plus simples, permettaient d'obtenir des performances plus élevées. Cet écart est d'avantage accentué quand les performances sont pondérées par les coûts d'achats et opérationnels. L'étude du comportement en performance de ces architectures a également permis de proposer in fine des règles de dimensionnement et de conception des architectures distribuées, dans ce contexte. En particulier, nous montrons comment l’étude des performances fait apparaitre les axes d’amélioration du matériel et comment il est possible de dimensionner un cluster pour traiter efficacement une instance donnée. Finalement, des propositions matérielles pour la conception de serveurs de calculs plus performants pour le traitement de graphes sont formulées. Premièrement, un mécanisme est proposé afin de tempérer la baisse significative de performance observée quand le cluster opère dans un point de fonctionnement où la mémoire vive est saturée. Enfin, les deux applications développées ont été évaluées sur une architecture à base de processeurs basse-consommation afin d'étudier la pertinence de telles architectures pour le traitement de graphes. Les performances mesurés en utilisant de telles plateformes sont encourageantes et montrent en particulier que la diminution des performances brutes par rapport aux architectures existantes est compensée par une efficacité énergétique bien supérieure.
- Published
- 2017
165. Is Arm software ecosystem ready for HPC?
- Author
-
Banchelli Gracia, Fabio F., Ruiz, Daniel, Hao Xu Lin, Ying, Mantovani, Filippo, and Barcelona Supercomputing Center
- Subjects
Arm system ,Supercomputadors ,Enginyeria mecànica [Àrees temàtiques de la UPC] ,High performance computing ,High-performance computing (HPC) ,Codes - Abstract
In recent years, the HPC community has increasingly grown its interest towards the Arm architecture with research projects targeting primarily the installation of Arm-based clusters. State of the art research project examples are the European Mont-Blanc, the Japanese Post-K, and the UKs GW4/EPSRC. Primarily attention is usually given to hardware platforms, and the Arm HPC community is growing as the hardware is evolving towards HPC workloads via solutions borrowed from mobile market e.g., big.LITTLE and additions such as Armv8-A Scalable Vector Extension (SVE) technology. However the availability of a mature software ecosystem and the possibility of running large and complex HPC applications plays a key role in the consolidation process of a new technology, especially in a conservative market like HPC. For this reason in this poster we present a preliminary evaluation of the Arm system software ecosystem, limited here to the Arm HPC Compiler and the Arm Performance Libraries, together with a porting and testing of three fairly complex HPC code suites: QuantumESPRESSO, WRF and FEniCS. The selection of these codes has not been totally random: they have been in fact proposed as HPC challenges during the last two editions of the Student Cluster Competition at ISC where all the authors have been involved operating an Arm-based cluster and awarded with the Fan Favorite award. The research leading to these results has received funding from the European Community's Seventh Framework Programme [FP7/2007-2013] and Horizon 2020 under the Mont-Blanc projects [3], grant agreements n. 288777, 610402 and 671697. The authors would also like to thank E4 Computer Engineering for providing part of the hardware resources needed for the evaluation carried out in this poster as well as for greatly supporting the Student Cluster Competition team.
- Published
- 2017
166. Unprotected computing: a large-scale study of DRAM raw error rate on a supercomputer
- Author
-
Leonardo Bautista-Gomez, Ferad Zyulkyarov, Osman Unsal, Simon McIntosh-Smith, and Barcelona Supercomputing Center
- Subjects
DRAM (Dynamic random access memory) ,Supercomputadors ,Large scale systems ,Enginyeria electrònica [Àrees temàtiques de la UPC] ,Supercomputers--Programming ,DRAM errors ,High-performance computing (HPC) ,Supercomputers ,Ordinadors--Dispositius de memòria - Abstract
Supercomputers offer new opportunities for scientific computing as they grow in size. However, their growth also poses new challenges. Resilience has been recognized as one of the most pressing issues to solve for extreme scale computing. Transistor scaling in the single-digit nanometer era and power constraints might dramatically increase the failure rate of next generation machines. DRAM errors have been analyzed in the past for different supercomputers but those studies are usually based on job scheduler logs and counters produced by hardware-level error correcting codes. Consequently, little is known about errors escaping hardware checks, which lead to silent data corruption. This work attempts to fill that gap by analyzing memory errors for over a year on a cluster with about 1000 nodes featuring low-power memory without error correction. The study gathered millions of events recording detailed information of thousands of memory errors, many of them corrupting multiple bits. Several factors are analyzed, such as temporal and spatial correlation between errors, but also the influence of temperature and even the position of the sun in the sky. The study showed that most multi-bit errors corrupted non-adjacent bits in the memory word and that most errors flipped memory bits from 1 to 0. In addition, we observed thousands of cases of multiple single-bit errors occurring simultaneously in different regions of the memory. These new observations would not be possible by simply analyzing error correction counters on classical systems. We propose several directions in which the findings of this study can help the design of more reliable systems in the future. The research leading to these results has received funding from the European Community’s Seventh Framework Programme [FP7/2007-2013] under the Mont-Blanc 2 Project (www.montblanc-project.eu), grant agreement n 610402 and it has been supported in part by the European Union (FEDER funds) under contract TIN2015-65316-P.
- Published
- 2016
167. Performance optimisation and productivity centre of excellence
- Author
-
Sally Bridgwater and Barcelona Supercomputing Center
- Subjects
Component ,Operations research ,Computer science ,Paraver ,Optimització matemàtica ,media_common.quotation_subject ,Best practice ,02 engineering and technology ,computer.software_genre ,Performance audit ,Supercomputadors ,Excellence ,Scalasca ,Return on investment ,0202 electrical engineering, electronic engineering, information engineering ,Productivity ,media_common ,Government ,Profiling ,Enginyeria electrònica [Àrees temàtiques de la UPC] ,020206 networking & telecommunications ,Optimization and computation series ,Engineering management ,Code refactoring ,020201 artificial intelligence & image processing ,High performance computing ,Performance improvement ,High-performance computing (HPC) ,computer ,Performance optimization - Abstract
As machines get larger and scientific applications advance, it is more and more imperative to fully utilize high performance computing (HPC) capability. The complexity and changing landscape of parallel computers may lead to users being unable or unsure how to achieve optimal performance from their applications and fully utilize their HPC resources. The Performance Optimisation and Productivity Centre of Excellence in Computing Applications (POP) has received funding from the European Commission as part of the Horizon 2020 programme to help alleviate these issues. It aims to uncover inefficiencies and their causes in existing parallel HPC applications that will lead to an improvement in the productivity and competitiveness of European organizations, in academia, government and industry. The POP project will drive efforts to highlight the need for and best practices in performance optimization through performance audits on codes along with training events to improve knowledge in this area. The aim is to help developers target their code development and refactoring in the most efficient direction and provide a return on investment from the savings due to the performance improvement. The POP project combines the expertise and experience of Barcelona Supercomputing Center (BSC), High Performance Computing Center Stuttgart, Jülich Supercomputing Centre (JSC), Numerical Algorithms Group Ltd (NAG), RWTH Aachen and TERATEC. This combination provides longstanding and well respected resources in the academic and commercial realms. The POP members have come together to create a coherent and consistent methodology to give a clear, precise and useful overview of the performance of each HPC application. The services of the POP project are free of charge to organizations with in the EU to analyze and advise on any parallel code in academic, government or industrial organizations of any domain.
- Published
- 2016
- Full Text
- View/download PDF
168. Is Arm software ecosystem ready for HPC?
- Author
-
Barcelona Supercomputing Center, Banchelli Gracia, Fabio F., Ruiz, Daniel, Hao Xu Lin, Ying, Mantovani, Filippo, Barcelona Supercomputing Center, Banchelli Gracia, Fabio F., Ruiz, Daniel, Hao Xu Lin, Ying, and Mantovani, Filippo
- Abstract
In recent years, the HPC community has increasingly grown its interest towards the Arm architecture with research projects targeting primarily the installation of Arm-based clusters. State of the art research project examples are the European Mont-Blanc, the Japanese Post-K, and the UKs GW4/EPSRC. Primarily attention is usually given to hardware platforms, and the Arm HPC community is growing as the hardware is evolving towards HPC workloads via solutions borrowed from mobile market e.g., big.LITTLE and additions such as Armv8-A Scalable Vector Extension (SVE) technology. However the availability of a mature software ecosystem and the possibility of running large and complex HPC applications plays a key role in the consolidation process of a new technology, especially in a conservative market like HPC. For this reason in this poster we present a preliminary evaluation of the Arm system software ecosystem, limited here to the Arm HPC Compiler and the Arm Performance Libraries, together with a porting and testing of three fairly complex HPC code suites: QuantumESPRESSO, WRF and FEniCS. The selection of these codes has not been totally random: they have been in fact proposed as HPC challenges during the last two editions of the Student Cluster Competition at ISC where all the authors have been involved operating an Arm-based cluster and awarded with the Fan Favorite award., The research leading to these results has received funding from the European Community's Seventh Framework Programme [FP7/2007-2013] and Horizon 2020 under the Mont-Blanc projects [3], grant agreements n. 288777, 610402 and 671697. The authors would also like to thank E4 Computer Engineering for providing part of the hardware resources needed for the evaluation carried out in this poster as well as for greatly supporting the Student Cluster Competition team., Postprint (author's final draft)
- Published
- 2017
169. GLT: A Unified API for Lightweight Thread Libraries
- Author
-
Barcelona Supercomputing Center, Castelló, Adrián, Seo, Sangmin, Mayo, Rafael, Balaji, Pavan, Quintana Ortí, Enrique Salvador, Peña, Antonio J., Barcelona Supercomputing Center, Castelló, Adrián, Seo, Sangmin, Mayo, Rafael, Balaji, Pavan, Quintana Ortí, Enrique Salvador, and Peña, Antonio J.
- Abstract
In recent years, several lightweight thread (LWT) libraries have emerged to tackle exascale challenges. These offer programming models (PMs) based on user-level threads and incorporate their own lightweight mechanisms. However, each library proposes its own PM, exposing different semantics and hindering portability. To address this drawback, we have designed Generic Lightweight Thread (GLT), an application programming interface that frames the functionality of the most popular LWT libraries for high-performance computing under a single PM. We implement GLT on top of Argobots, MassiveThreads, and Qthreads. We provide GLT as a dynamic library, as well as in the form of a static version based on macro preprocessing resolution to reduce overhead. This paper discusses the GLT PM and demonstrates its minimal performance impact., Researchers from the Universitat Jaume I de Castelló were supported by project TIN2014-53495-R of the MINECO, the Generalitat Valenciana fellowship programme Vali+d 2015, and FEDER. Antonio J. Peña is cofinancied by the Spanish Ministry of Economy and Competitiveness under Juan de la Cierva fellowship number IJCI-2015-23266. This work was partially supported by the U.S. Dept. of Energy, Office of Science, Office of Advanced Scientific Computing Research (SC-21), under contract DE-AC02-06CH11357., Peer Reviewed, Postprint (author's final draft)
- Published
- 2017
170. Radiation-Induced Error Criticality in Modern HPC Parallel Accelerators
- Author
-
Daniel Oliveira, Luigi Carro, José María Cela, Paolo Rech, Philippe O. A. Navaux, Fernando Antonio Da Silva Fernandes, Caio Lunardi, Mauricio Hanzich, Laércio Lima Pilla, Vinicius Fratin, and Barcelona Supercomputing Center
- Subjects
Computer science ,Energies [Àrees temàtiques de la UPC] ,Reliability (computer networking) ,Parallel Accelerators ,Radiative forcing ,02 engineering and technology ,Parallel computing ,Finite Difference Methods ,01 natural sciences ,Stencil ,Set (abstract data type) ,Supercomputadors ,Approximation error ,0103 physical sciences ,0202 electrical engineering, electronic engineering, information engineering ,Detectors de radiació ,020203 distributed computing ,Parallel processing (Electronic computers) ,010308 nuclear & particles physics ,Finite difference method ,Supercomputer ,Criticality ,Computer engineering ,High performance computing ,High-performance computing (HPC) ,Xeon Phi - Abstract
In this paper, we evaluate the error criticality of radiation-induced errors on modern High-Performance Computing (HPC) accelerators (Intel Xeon Phi and NVIDIA K40) through a dedicated set of metrics. We show that, as long as imprecise computing is concerned, the simple mismatch detection is not sufficient to evaluate and compare the radiation sensitivity of HPC devices and algorithms. Our analysis quantifies and qualifies radiation effects on applications’ output correlating the number of corrupted elements with their spatial locality. Also, we provide the mean relative error (dataset-wise) to evaluate radiation-induced error magnitude. We apply the selected metrics to experimental results obtained in various radiation test campaigns for a total of more than 400 hours of beam time per device. The amount of data we gathered allows us to evaluate the error criticality of a representative set of algorithms from HPC suites. Additionally, based on the characteristics of the tested algorithms, we draw generic reliability conclusions for broader classes of codes. We show that arithmetic operations are less critical for the K40, while Xeon Phi is more reliable when executing particles interactions solved through Finite Difference Methods. Finally, iterative stencil operations seem the most reliable on both architectures. This work was supported by the STIC-AmSud/CAPES scientific cooperation program under the EnergySFE research project grant 99999.007556/2015-02, EU H2020 Programme, and MCTI/RNP-Brazil under the HPC4E Project, grant agreement n° 689772. Tested K40 boards were donated thanks to Steve Keckler, Timothy Tsai, and Siva Hari from NVIDIA.
- Published
- 2016
171. The Mont-Blanc prototype: an alternative approach for HPC systems
- Author
-
Nikola Rajovic, Alejandro Rico, Filippo Mantovani, Daniel Ruiz, Josep Oriol Vilarrubi, Constantino Gomez, Luna Backes, Diego Nieto, Harald Servat, Xavier Martorell, Jesus Labarta, Eduard Ayguade, Chris Adeniyi-Jones, Said Derradji, Herve Gloaguen, Piero Lanucara, Nico Sanna, Jean-Francois Mehaut, Kevin Pouget, Brice Videau, Eric Boyer, Momme Allalen, Axel Auweter, David Brayford, Daniele Tafani, Volker Weinberg, Dirk Brommel, Rene Halver, Jan H. Meinke, Ramon Beivide, Mariano Benito, Enrique Vallejo, Mateo Valero, Alex Ramirez, Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Barcelona Supercomputing Center, and Universitat Politècnica de Catalunya. CAP - Grup de Computació d'Altes Prestacions
- Subjects
020203 distributed computing ,Informàtica [Àrees temàtiques de la UPC] ,020204 information systems ,Roadrunner supercomputers ,0202 electrical engineering, electronic engineering, information engineering ,ASCI Red ,02 engineering and technology ,High performance computing ,High-performance computing (HPC) ,Càlcul intensiu (Informàtica) - Abstract
High-performance computing (HPC) is recognized as one of the pillars for further progress in science, industry, medicine, and education. Current HPC systems are being developed to overcome emerging architectural challenges in order to reach Exascale level of performance, projected for the year 2020. The much larger embedded and mobile market allows for rapid development of intellectual property (IP) blocks and provides more flexibility in designing an application specific system-on-chip (SoC), in turn providing the possibility in balancing performance, energy-efficiency, and cost. In the Mont-Blanc project, we advocate for HPC systems being built from such commodity IP blocks, currently used in embedded and mobile SoCs. As a first demonstrator of such an approach, we present the Mont-Blanc prototype; the first HPC system built with commodity SoCs, memories, and network interface cards (NICs) from the embedded and mobile domain, and off-the-shelf HPC networking, storage, cooling, and integration solutions. We present the system’s architecture and evaluate both performance and energy efficiency. Further, we compare the system’s abilities against a production level supercomputer. At the end, we discuss parallel scalability and estimate the maximum scalability point of this approach across a set of applications.
- Published
- 2016
- Full Text
- View/download PDF
172. High Resolution Model Intercomparison Project (HighResMIP v1.0) for CMIP6
- Author
-
Haarsma, R. J., Roberts, M. J., Vidale, P. L., Senior, C. A., Bellucci, Bao, Chang, Corti, Fu?kar, N. S., Guemas, von Hardenberg, Hazeleger, Kodama, Koenigk, Leung, L. R., Luo, J.-J., Mao, Mizielinski, M. S., Mizuta, Nobre, Satoh, Scoccimarro, Semmler, Small, von Storch, J.-S.: High Resolution Model Intercomparison Project (HighResMIP v1.0) for CMIP6, Geosci. Model Dev., 9, 4185-4208, doi:10.5194/gmd-9-4185-2016, 2016., and Barcelona Supercomputing Center
- Subjects
Meteorologie en Luchtkwaliteit ,Research program ,High Resolution Model Intercomparison Project (HighResMIP) ,Climate Research ,Meteorology and Air Quality ,010504 meteorology & atmospheric sciences ,Meteorology ,Atmospheric circulation ,media_common.quotation_subject ,0208 environmental biotechnology ,Climate Models ,Fidelity ,Climate change ,02 engineering and technology ,01 natural sciences ,Klimatforskning ,Robustness (computer science) ,Life Science ,High resolution ,0105 earth and related environmental sciences ,Grand Challenges ,media_common ,WIMEK ,World Climate Research Program (WCRP) ,lcsh:QE1-996.5 ,Inter comparison projects ,Enginyeria biomèdica [Àrees temàtiques de la UPC] ,Madden–Julian oscillation ,Circulació atmosfèrica ,Coupled Model Intercomparison Project 6 (CMIP6) ,020801 environmental engineering ,lcsh:Geology ,13. Climate action ,Climatology ,Environmental science ,Tropical cyclone ,High-performance computing (HPC) ,Canvis climàtics - Abstract
Robust projections and predictions of climate variability and change, particularly at regional scales, rely on the driving processes being represented with fidelity in model simulations. The role of enhanced horizontal resolution in improved process representation in all components of the climate system is of growing interest, particularly as some recent simulations suggest both the possibility of significant changes in large-scale aspects of circulation as well as improvements in small-scale processes and extremes. However, such high-resolution global simulations at climate timescales, with resolutions of at least 50 km in the atmosphere and 0.25° in the ocean, have been performed at relatively few research centres and generally without overall coordination, primarily due to their computational cost. Assessing the robustness of the response of simulated climate to model resolution requires a large multi-model ensemble using a coordinated set of experiments. The Coupled Model Intercomparison Project 6 (CMIP6) is the ideal framework within which to conduct such a study, due to the strong link to models being developed for the CMIP DECK experiments and other model intercomparison projects (MIPs). Increases in high-performance computing (HPC) resources, as well as the revised experimental design for CMIP6, now enable a detailed investigation of the impact of increased resolution up to synoptic weather scales on the simulated mean climate and its variability. The High Resolution Model Intercomparison Project (HighResMIP) presented in this paper applies, for the first time, a multi-model approach to the systematic investigation of the impact of horizontal resolution. A coordinated set of experiments has been designed to assess both a standard and an enhanced horizontal-resolution simulation in the atmosphere and ocean. The set of HighResMIP experiments is divided into three tiers consisting of atmosphere-only and coupled runs and spanning the period 1950–2050, with the possibility of extending to 2100, together with some additional targeted experiments. This paper describes the experimental set-up of HighResMIP, the analysis plan, the connection with the other CMIP6 endorsed MIPs, as well as the DECK and CMIP6 historical simulations. HighResMIP thereby focuses on one of the CMIP6 broad questions, “what are the origins and consequences of systematic model biases?”, but we also discuss how it addresses the World Climate Research Program (WCRP) grand challenges. PRIMAVERA project members (Malcolm J. Roberts, Reindert J. Haarsma, Pier Luigi Vidale, Torben Koenigk, Virginie Guemas, Susanna Corti, Jost von Hardenberg, Jin-Song von Storch,Wilco Hazeleger, Catherine A. Senior, Matthew S. Mizielinsky, Tido Semmler, Alessio Bellucci, Enrico Scoccimarro, Neven S. Fuckar) acknowledge funding received from the European Commission under grant agreement 641727 of the Horizon 2020 research programme. Chihiro Kodama acknowledges Y. Yamada, M. Nakano, T. Nasuno, T. Miyakawa, and H. Miura for analysis ideas. Neven S. Fuckar acknowledges support of the Juan de la Ciervaincorporación postdoctoral fellowship from the Ministry of Economy and Competitiveness of Spain. L. Ruby Leung and Jian Lu acknowledge support from the U.S. Department of Energy Office of Science Biological and Environmental Research as part of the Regional and Global Climate Modeling Program. The Pacific Northwest National Laboratory is operated for the DOE by Battelle Memorial Institute under contract DE-AC05-76RLO1830. Jiafu Mao is supported by the Biogeochemistry-Climate Feedbacks Scientific Focus Area project funded through the Regional and Global Climate Modeling Program in Climate and Environmental Sciences Division (CESD) of the Biological and Environmental Research (BER) Program in the U.S. Department of Energy Office of Science. Oak Ridge National Laboratory is managed by UTBATTELLE for the DOE under contract DE-AC05-00OR22725. Paulo Nobre acknowledges support from CNPq grant nos. 573797/2008-0 and 490237/2011-8, and FAPESP grant no. 2008/57719-9. Chihiro Kodama and Masaki Satoh are supported by the Program for Risk Information on Climate Change (SOSEI) and the FLAGSHIP2020 within the priority study4 (Advancement of meteorological and global environmental predictions utilizing observational “Big Data”), which are promoted by the Ministry of Education, Culture, Sports, Science and Technology (MEXT), Japan. Ping Chang is supported by US National Science Foundation grants AGS-1462127 and AGS-1067937, and National Oceanic and Atmospheric Administration grant NA11OAR4310154, as well as by China’s National Basic Research Priorities Programme (2013CB956204 and 2014CB745000). We thank Martin Juckes and his team for all their work on the HighResMIP and CMIP6 data request. Nick Rayner and John Kennedy for allowing early access to the HadISST2 daily, 1/4º SST and sea-ice dataset. Mark Ringer and Mark Webb for ideas for the targeted CFMIP-style experiment. Francois Massonnet for discussions on high-resolution modelling and sea ice.
- Published
- 2016
- Full Text
- View/download PDF
173. Performance optimisation and productivity centre of excellence
- Author
-
Barcelona Supercomputing Center, Bridgwater, Sally, Barcelona Supercomputing Center, and Bridgwater, Sally
- Abstract
As machines get larger and scientific applications advance, it is more and more imperative to fully utilize high performance computing (HPC) capability. The complexity and changing landscape of parallel computers may lead to users being unable or unsure how to achieve optimal performance from their applications and fully utilize their HPC resources. The Performance Optimisation and Productivity Centre of Excellence in Computing Applications (POP) has received funding from the European Commission as part of the Horizon 2020 programme to help alleviate these issues. It aims to uncover inefficiencies and their causes in existing parallel HPC applications that will lead to an improvement in the productivity and competitiveness of European organizations, in academia, government and industry. The POP project will drive efforts to highlight the need for and best practices in performance optimization through performance audits on codes along with training events to improve knowledge in this area. The aim is to help developers target their code development and refactoring in the most efficient direction and provide a return on investment from the savings due to the performance improvement. The POP project combines the expertise and experience of Barcelona Supercomputing Center (BSC), High Performance Computing Center Stuttgart, Jülich Supercomputing Centre (JSC), Numerical Algorithms Group Ltd (NAG), RWTH Aachen and TERATEC. This combination provides longstanding and well respected resources in the academic and commercial realms. The POP members have come together to create a coherent and consistent methodology to give a clear, precise and useful overview of the performance of each HPC application. The services of the POP project are free of charge to organizations with in the EU to analyze and advise on any parallel code in academic, government or industrial organizations of any domain., Peer Reviewed, Postprint (author's final draft)
- Published
- 2016
174. Radiation-Induced Error Criticality in Modern HPC Parallel Accelerators
- Author
-
Barcelona Supercomputing Center, Oliveira, Daniel, Pilla, Laercio, Hanzich, Mauricio, Fratin, Vinicius, Fernandes, Fernando, Lunardi, Caio, Cela, Jose M., Navaux, Philippe, Carro, Luigi, Rech, Paolo, Barcelona Supercomputing Center, Oliveira, Daniel, Pilla, Laercio, Hanzich, Mauricio, Fratin, Vinicius, Fernandes, Fernando, Lunardi, Caio, Cela, Jose M., Navaux, Philippe, Carro, Luigi, and Rech, Paolo
- Abstract
In this paper, we evaluate the error criticality of radiation-induced errors on modern High-Performance Computing (HPC) accelerators (Intel Xeon Phi and NVIDIA K40) through a dedicated set of metrics. We show that, as long as imprecise computing is concerned, the simple mismatch detection is not sufficient to evaluate and compare the radiation sensitivity of HPC devices and algorithms. Our analysis quantifies and qualifies radiation effects on applications’ output correlating the number of corrupted elements with their spatial locality. Also, we provide the mean relative error (dataset-wise) to evaluate radiation-induced error magnitude. We apply the selected metrics to experimental results obtained in various radiation test campaigns for a total of more than 400 hours of beam time per device. The amount of data we gathered allows us to evaluate the error criticality of a representative set of algorithms from HPC suites. Additionally, based on the characteristics of the tested algorithms, we draw generic reliability conclusions for broader classes of codes. We show that arithmetic operations are less critical for the K40, while Xeon Phi is more reliable when executing particles interactions solved through Finite Difference Methods. Finally, iterative stencil operations seem the most reliable on both architectures., This work was supported by the STIC-AmSud/CAPES scientific cooperation program under the EnergySFE research project grant 99999.007556/2015-02, EU H2020 Programme, and MCTI/RNP-Brazil under the HPC4E Project, grant agreement n° 689772. Tested K40 boards were donated thanks to Steve Keckler, Timothy Tsai, and Siva Hari from NVIDIA., Postprint (author's final draft)
- Published
- 2016
175. Unprotected computing: a large-scale study of DRAM raw error rate on a supercomputer
- Author
-
Barcelona Supercomputing Center, Bautista-Gomez, Leonardo, Zyulkyarov, Ferad, Unsal, Osman, McIntosh-Smith, Simon, Barcelona Supercomputing Center, Bautista-Gomez, Leonardo, Zyulkyarov, Ferad, Unsal, Osman, and McIntosh-Smith, Simon
- Abstract
Supercomputers offer new opportunities for scientific computing as they grow in size. However, their growth also poses new challenges. Resilience has been recognized as one of the most pressing issues to solve for extreme scale computing. Transistor scaling in the single-digit nanometer era and power constraints might dramatically increase the failure rate of next generation machines. DRAM errors have been analyzed in the past for different supercomputers but those studies are usually based on job scheduler logs and counters produced by hardware-level error correcting codes. Consequently, little is known about errors escaping hardware checks, which lead to silent data corruption. This work attempts to fill that gap by analyzing memory errors for over a year on a cluster with about 1000 nodes featuring low-power memory without error correction. The study gathered millions of events recording detailed information of thousands of memory errors, many of them corrupting multiple bits. Several factors are analyzed, such as temporal and spatial correlation between errors, but also the influence of temperature and even the position of the sun in the sky. The study showed that most multi-bit errors corrupted non-adjacent bits in the memory word and that most errors flipped memory bits from 1 to 0. In addition, we observed thousands of cases of multiple single-bit errors occurring simultaneously in different regions of the memory. These new observations would not be possible by simply analyzing error correction counters on classical systems. We propose several directions in which the findings of this study can help the design of more reliable systems in the future., The research leading to these results has received funding from the European Community’s Seventh Framework Programme [FP7/2007-2013] under the Mont-Blanc 2 Project (www.montblanc-project.eu), grant agreement n 610402 and it has been supported in part by the European Union (FEDER funds) under contract TIN2015-65316-P., Peer Reviewed, Postprint (author's final draft)
- Published
- 2016
176. The Mont-Blanc prototype: an alternative approach for HPC systems
- Author
-
Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Barcelona Supercomputing Center, Universitat Politècnica de Catalunya. CAP - Grup de Computació d'Altes Prestacions, Rajovic, Nikola, Rico, Alejandro, Mantovani, Filippo, Ruiz, Daniel, Vlarrubi, Josep O., Gomez, Constantino, Backes, Luna, Nieto, Diego, Servat, Harald, Martorell Bofill, Xavier, Labarta Mancho, Jesús José, Ayguadé Parra, Eduard, Adeniyi-Jones, Chris, Derradji, Said, Gloaguen, Hervé, Lanucara, Piero, Sanna, Nico, Mehaut, Jean-François, Pouget, Kevin, Videau, Brice, Boyer, Eric, Allalen, Momme, Auweter, Axel, Brayford, David, Tafani, Daniele, Weinberg, Volker, Brömmel, Dirk, Halver, René, Meinke, Jan H., Beivide Palacio, Ramon, Benito, Mariano, Vallejo, Enrique, Valero Cortés, Mateo, Ramirez, Alex, Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Barcelona Supercomputing Center, Universitat Politècnica de Catalunya. CAP - Grup de Computació d'Altes Prestacions, Rajovic, Nikola, Rico, Alejandro, Mantovani, Filippo, Ruiz, Daniel, Vlarrubi, Josep O., Gomez, Constantino, Backes, Luna, Nieto, Diego, Servat, Harald, Martorell Bofill, Xavier, Labarta Mancho, Jesús José, Ayguadé Parra, Eduard, Adeniyi-Jones, Chris, Derradji, Said, Gloaguen, Hervé, Lanucara, Piero, Sanna, Nico, Mehaut, Jean-François, Pouget, Kevin, Videau, Brice, Boyer, Eric, Allalen, Momme, Auweter, Axel, Brayford, David, Tafani, Daniele, Weinberg, Volker, Brömmel, Dirk, Halver, René, Meinke, Jan H., Beivide Palacio, Ramon, Benito, Mariano, Vallejo, Enrique, Valero Cortés, Mateo, and Ramirez, Alex
- Abstract
High-performance computing (HPC) is recognized as one of the pillars for further progress in science, industry, medicine, and education. Current HPC systems are being developed to overcome emerging architectural challenges in order to reach Exascale level of performance, projected for the year 2020. The much larger embedded and mobile market allows for rapid development of intellectual property (IP) blocks and provides more flexibility in designing an application specific system-on-chip (SoC), in turn providing the possibility in balancing performance, energy-efficiency, and cost. In the Mont-Blanc project, we advocate for HPC systems being built from such commodity IP blocks, currently used in embedded and mobile SoCs. As a first demonstrator of such an approach, we present the Mont-Blanc prototype; the first HPC system built with commodity SoCs, memories, and network interface cards (NICs) from the embedded and mobile domain, and off-the-shelf HPC networking, storage, cooling, and integration solutions. We present the system’s architecture and evaluate both performance and energy efficiency. Further, we compare the system’s abilities against a production level supercomputer. At the end, we discuss parallel scalability and estimate the maximum scalability point of this approach across a set of applications., Peer Reviewed, Postprint (published version)
- Published
- 2016
177. High Resolution Model Intercomparison Project (HighResMIP v1.0) for CMIP6
- Author
-
Barcelona Supercomputing Center, Haarsma, Reindert J., Roberts, Malcolm J., Vidale, Pier L., Senior, Catherine A., Bellucci, Alessio, Bao, Qing, Chang, Ping, Corti, Susanna, Fuckar, Neven S., Guemas, Virginie, Hardenberg, Jost von, Hazeleger, Wilco, Kodama, Chihiro, Koenigk, Torben, Leung, L. Ruby, Lu, Jian, Luo, Jing-Jia, Mao, Jiafu, Mizielinski, Matthew S., Mizuta, Ryo, Nobre, Paulo, Satoh, Masaki, Scoccimarro, Enrico, Semmler, Tido, Small, Justin, von Storch, Jing-Song, Barcelona Supercomputing Center, Haarsma, Reindert J., Roberts, Malcolm J., Vidale, Pier L., Senior, Catherine A., Bellucci, Alessio, Bao, Qing, Chang, Ping, Corti, Susanna, Fuckar, Neven S., Guemas, Virginie, Hardenberg, Jost von, Hazeleger, Wilco, Kodama, Chihiro, Koenigk, Torben, Leung, L. Ruby, Lu, Jian, Luo, Jing-Jia, Mao, Jiafu, Mizielinski, Matthew S., Mizuta, Ryo, Nobre, Paulo, Satoh, Masaki, Scoccimarro, Enrico, Semmler, Tido, Small, Justin, and von Storch, Jing-Song
- Abstract
Robust projections and predictions of climate variability and change, particularly at regional scales, rely on the driving processes being represented with fidelity in model simulations. The role of enhanced horizontal resolution in improved process representation in all components of the climate system is of growing interest, particularly as some recent simulations suggest both the possibility of significant changes in large-scale aspects of circulation as well as improvements in small-scale processes and extremes. However, such high-resolution global simulations at climate timescales, with resolutions of at least 50 km in the atmosphere and 0.25° in the ocean, have been performed at relatively few research centres and generally without overall coordination, primarily due to their computational cost. Assessing the robustness of the response of simulated climate to model resolution requires a large multi-model ensemble using a coordinated set of experiments. The Coupled Model Intercomparison Project 6 (CMIP6) is the ideal framework within which to conduct such a study, due to the strong link to models being developed for the CMIP DECK experiments and other model intercomparison projects (MIPs). Increases in high-performance computing (HPC) resources, as well as the revised experimental design for CMIP6, now enable a detailed investigation of the impact of increased resolution up to synoptic weather scales on the simulated mean climate and its variability. The High Resolution Model Intercomparison Project (HighResMIP) presented in this paper applies, for the first time, a multi-model approach to the systematic investigation of the impact of horizontal resolution. A coordinated set of experiments has been designed to assess both a standard and an enhanced horizontal-resolution simulation in the atmosphere and ocean. The set of HighResMIP experiments is divided into three tiers consisting of atmosphere-only and coupled runs and spanning the period 1950–2050, with the p, PRIMAVERA project members (Malcolm J. Roberts, Reindert J. Haarsma, Pier Luigi Vidale, Torben Koenigk, Virginie Guemas, Susanna Corti, Jost von Hardenberg, Jin-Song von Storch,Wilco Hazeleger, Catherine A. Senior, Matthew S. Mizielinsky, Tido Semmler, Alessio Bellucci, Enrico Scoccimarro, Neven S. Fuckar) acknowledge funding received from the European Commission under grant agreement 641727 of the Horizon 2020 research programme. Chihiro Kodama acknowledges Y. Yamada, M. Nakano, T. Nasuno, T. Miyakawa, and H. Miura for analysis ideas. Neven S. Fuckar acknowledges support of the Juan de la Ciervaincorporación postdoctoral fellowship from the Ministry of Economy and Competitiveness of Spain. L. Ruby Leung and Jian Lu acknowledge support from the U.S. Department of Energy Office of Science Biological and Environmental Research as part of the Regional and Global Climate Modeling Program. The Pacific Northwest National Laboratory is operated for the DOE by Battelle Memorial Institute under contract DE-AC05-76RLO1830. Jiafu Mao is supported by the Biogeochemistry-Climate Feedbacks Scientific Focus Area project funded through the Regional and Global Climate Modeling Program in Climate and Environmental Sciences Division (CESD) of the Biological and Environmental Research (BER) Program in the U.S. Department of Energy Office of Science. Oak Ridge National Laboratory is managed by UTBATTELLE for the DOE under contract DE-AC05-00OR22725. Paulo Nobre acknowledges support from CNPq grant nos. 573797/2008-0 and 490237/2011-8, and FAPESP grant no. 2008/57719-9. Chihiro Kodama and Masaki Satoh are supported by the Program for Risk Information on Climate Change (SOSEI) and the FLAGSHIP2020 within the priority study4 (Advancement of meteorological and global environmental predictions utilizing observational “Big Data”), which are promoted by the Ministry of Education, Culture, Sports, Science and Technology (MEXT), Japan. Ping Chang is supported by US National Science Foundation gra, Peer Reviewed, Postprint (published version)
- Published
- 2016
178. Fully-Coupled Electromechanical Simulations of the LV Dog Anatomy Using HPC: Model Testing and Verification
- Author
-
Alfonso Santiago, Jazmin Aguado-Sierra, David Soto-Iglesias, Oscar Camara, Lydia Dux-Santoy, Mariña López-Yunta, Matias I. Rivero, Mariano Vázquez, and Barcelona Supercomputing Center
- Subjects
Engineering ,Cardiologia--Investigació ,High-Performance Computing (HPC) ,03 medical and health sciences ,0302 clinical medicine ,Left ventricular geometry ,Simulation ,030304 developmental biology ,0303 health sciences ,Ground truth ,Cardiac cycle ,Mathematical model ,business.industry ,Enginyeria electrònica [Àrees temàtiques de la UPC] ,Verification ,Solver ,Ground-truth data ,Fully coupled ,Electromechanical simulations ,Model testing ,Heart beat ,High performance computing ,business ,Canine model ,Càlcul intensiu (Informàtica) ,030217 neurology & neurosurgery ,Heart beat--Mathematical models - Abstract
Verification of electro-mechanic models of the heart require a good amount of reliable, high resolution, thorough in-vivo measurements. The detail of the mathematical models used to create simulations of the heart beat vary greatly. Generally, the objective of the simulation determines the modeling approach. However, it is important to exactly quantify the amount of error between the various approaches that can be used to simulate a heart beat by comparing them to ground truth data. The more detailed the model is, the more computing power it requires, we therefore employ a high-performance computing solver throughout this study. We aim to compare models to data measured experimentally to identify the effect of using a mathematical model of fibre orientation versus the measured fibre orientations using DT-MRI. We also use simultaneous endocardial stimuli vs an instantaneous myocardial stimulation to trigger the mechanic contraction. Our results show that synchronisation of the electrical and mechanical events in the heart beat are necessary to create a physiological timing of hemodynamic events. Synchronous activation of all of the myocardium provides an unrealistic timing of hemodynamic events in the cardiac cycle. Results also show the need of establishing a protocol to quantify the zero-pressure configuration of the left ventricular geometry to initiate the simulation protocol; however, the predicted zero-pressure configuration of the same geometry was different, depending on the origin of the fibre field employed. This work has been done with the support of the grant SEV-2011-00067 of Severo Ochoa Program, awarded by the Spanish Government to the Barcelona Supercomputing Center. Part of the research leading to these results has received funding from the Seventh Framework Programme (FP7/2007-2013) under grant agreement n 611823. It has also been partially funded from the by the Spanish Ministry of Economy and Competitiveness (TIN2011-28067).
- Published
- 2015
179. Fully-Coupled Electromechanical Simulations of the LV Dog Anatomy Using HPC: Model Testing and Verification
- Author
-
Barcelona Supercomputing Center, Aguado-Sierra, Jazmin, Santiago, Alfonso, Rivero, Matías I., Lopez-Yunta, Mariña, Soto-Iglesias, David, Dux-Santoy, Lydia, Camara, Oscar, Vázquez, Mariano, Barcelona Supercomputing Center, Aguado-Sierra, Jazmin, Santiago, Alfonso, Rivero, Matías I., Lopez-Yunta, Mariña, Soto-Iglesias, David, Dux-Santoy, Lydia, Camara, Oscar, and Vázquez, Mariano
- Abstract
Verification of electro-mechanic models of the heart require a good amount of reliable, high resolution, thorough in-vivo measurements. The detail of the mathematical models used to create simulations of the heart beat vary greatly. Generally, the objective of the simulation determines the modeling approach. However, it is important to exactly quantify the amount of error between the various approaches that can be used to simulate a heart beat by comparing them to ground truth data. The more detailed the model is, the more computing power it requires, we therefore employ a high-performance computing solver throughout this study. We aim to compare models to data measured experimentally to identify the effect of using a mathematical model of fibre orientation versus the measured fibre orientations using DT-MRI. We also use simultaneous endocardial stimuli vs an instantaneous myocardial stimulation to trigger the mechanic contraction. Our results show that synchronisation of the electrical and mechanical events in the heart beat are necessary to create a physiological timing of hemodynamic events. Synchronous activation of all of the myocardium provides an unrealistic timing of hemodynamic events in the cardiac cycle. Results also show the need of establishing a protocol to quantify the zero-pressure configuration of the left ventricular geometry to initiate the simulation protocol; however, the predicted zero-pressure configuration of the same geometry was different, depending on the origin of the fibre field employed., This work has been done with the support of the grant SEV-2011-00067 of Severo Ochoa Program, awarded by the Spanish Government to the Barcelona Supercomputing Center. Part of the research leading to these results has received funding from the Seventh Framework Programme (FP7/2007-2013) under grant agreement n 611823. It has also been partially funded from the by the Spanish Ministry of Economy and Competitiveness (TIN2011-28067)., Peer Reviewed, Postprint (author's final draft)
- Published
- 2015
180. 100% Green Computing At The Wrong Location?
- Author
-
Kienle, Frank (Dr.-Ing.) and de Schryver, Christian
- Subjects
Energieffizienz ,Energy Efficiency ,Topologie ,Cloud Computing ,High-Performance Computing (HPC) ,Eingebettetes System ,Green Computing ,Low-Power ,Green-IT ,Supercomputer ,Embedded System ,ddc:004 ,ddc:620 ,Smart Grid ,Hochleistungsrechnen ,Embedded Systems - Abstract
Modern society relies on convenience services and mobile communication. Cloud computing is the current trend to make data and applications available at any time on every device. Data centers concentrate computation and storage at central locations, while they claim themselves green due to their optimized maintenance and increased energy efficiency. The key enabler for this evolution is the microelectronics industry. The trend to power efficient mobile devices has forced this industry to change its design dogma to: ”keep data locally and reduce data communication whenever possible”. Therefore we ask: is cloud computing repeating the aberrations of its enabling industry?
- Published
- 2012
181. Performance monitoring with PAPI : Using the performance application programming interface
- Author
-
Mucci, P., Smeds, Nils, Ekman, P., Mucci, P., Smeds, Nils, and Ekman, P.
- Abstract
The importance of using the performance application programming interface (PAPI) for performance monitoring is discussed. To facilitate the development of portable performance tools, PAPI provides interfaces to get information about the execution environment. It also provides methods to obtain a complete listing of what performance monitoring (PM) events are available for monitoring. PAPI's goal is to expose real hardware performance information to users, which will help in eliminating most of the guesswork regarding the root cause of a code's performance problem., QC 20141205
- Published
- 2005
182. High-Performance Energy-Efficient Multicore Embedded Computing.
- Author
-
Munir, Arslan, Ranka, Sanjay, and Gordon-Ross, Ann
- Subjects
- *
EMBEDDED computer systems , *MULTICORE processors , *SUPERCOMPUTERS , *COMPUTER software , *MOORE'S law , *TRANSISTORS , *ENERGY consumption - Abstract
With Moore's law supplying billions of transistors on-chip, embedded systems are undergoing a transition from single-core to multicore to exploit this high-transistor density for high performance. Embedded systems differ from traditional high-performance supercomputers in that power is a first-order constraint for embedded systems; whereas, performance is the major benchmark for supercomputers. The increase in on-chip transistor density exacerbates power/thermal issues in embedded systems, which necessitates novel hardware/software power/thermal management techniques to meet the ever-increasing high-performance embedded computing demands in an energy-efficient manner. This paper outlines typical requirements of embedded applications and discusses state-of-the-art hardware/software high-performance energy-efficient embedded computing (HPEEC) techniques that help meeting these requirements. We also discuss modern multicore processors that leverage these HPEEC techniques to deliver high performance per watt. Finally, we present design challenges and future research directions for HPEEC system development. [ABSTRACT FROM AUTHOR]
- Published
- 2012
- Full Text
- View/download PDF
183. GLT: A Unified API for Lightweight Thread Libraries
- Author
-
Antonio J. Peña, Enrique S. Quintana-Ortí, Rafael Mayo, Sangmin Seo, Adrián Castelló, Pavan Balaji, and Barcelona Supercomputing Center
- Subjects
020203 distributed computing ,business.industry ,Computer science ,Enginyeria elèctrica [Àrees temàtiques de la UPC] ,Lightweight thread (LWT) ,010103 numerical & computational mathematics ,02 engineering and technology ,Thread (computing) ,computer.software_genre ,01 natural sciences ,Software portability ,Supercomputadors ,Generic Lightweight Thread (GLT) ,Embedded system ,0202 electrical engineering, electronic engineering, information engineering ,Programming paradigm ,Operating system ,High performance computing ,0101 mathematics ,business ,High-performance computing (HPC) ,computer - Abstract
In recent years, several lightweight thread (LWT) libraries have emerged to tackle exascale challenges. These offer programming models (PMs) based on user-level threads and incorporate their own lightweight mechanisms. However, each library proposes its own PM, exposing different semantics and hindering portability. To address this drawback, we have designed Generic Lightweight Thread (GLT), an application programming interface that frames the functionality of the most popular LWT libraries for high-performance computing under a single PM. We implement GLT on top of Argobots, MassiveThreads, and Qthreads. We provide GLT as a dynamic library, as well as in the form of a static version based on macro preprocessing resolution to reduce overhead. This paper discusses the GLT PM and demonstrates its minimal performance impact. Researchers from the Universitat Jaume I de Castelló were supported by project TIN2014-53495-R of the MINECO, the Generalitat Valenciana fellowship programme Vali+d 2015, and FEDER. Antonio J. Peña is cofinancied by the Spanish Ministry of Economy and Competitiveness under Juan de la Cierva fellowship number IJCI-2015-23266. This work was partially supported by the U.S. Dept. of Energy, Office of Science, Office of Advanced Scientific Computing Research (SC-21), under contract DE-AC02-06CH11357.
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.