2,552 results on '"Supercomputers"'
Search Results
2. A low‐latency memory‐cube network with dual diagonal mesh topology and bypassed pipelines.
- Author
-
Oda, Masashi, Keida, Kai, and Yasudo, Ryota
- Subjects
TRAFFIC patterns ,ROUTING algorithms ,CUBES ,SUPERCOMPUTERS ,MEMORY - Abstract
Summary: A memory cube network is an interconnection network composed of 3D stacked memories called memory cubes. By exploiting a packet switching, it can provide fast memory accesses to a large number of memory cubes. Although interconnection networks have been studied in many years for supercomputers and data centers, existing technologies are difficult to apply to memory cube networks. This is because the link length and the number of ports are limited, and hence the hop count increases. In this article, we propose a dual diagonal mesh (DDM), a layout‐oriented memory‐cube network. Furthermore, we propose the routing algorithm and the router architecture with bypassed pipelines for DDM. Our experimental results demonstrate that our routing and router architecture with bypassed pipelines reduces the memory access latency. We implement four router architectures and evaluate them with the traffic patterns derived from the NAS parallel benchmark. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
3. Certain Domination Parameters and Their Resolving Versions of Fractal Cubic Networks.
- Author
-
Prabhu, Savari, Arulmozhi, Arumugam Krishnan, and Arulperumjothi, M.
- Subjects
- *
FRACTALS , *SUPERCOMPUTERS - Abstract
Networks are designed to communicate, operate, and allocate tasks to respective commodities. Operating supercomputers became challenging, which was handled by the network design commonly known as hypercube, denoted by Q n . In a recent study, the hypercube networks were insufficient to hold supercomputers' parallel processors. Thus, variants of hypercubes were discovered to produce an alternative to the hypercube. A new variant of the hypercube, the fractal cubic network, can be used as the best alternative in the case of hypercubes. Our research investigates that the fractal cubic network is a rooted product of two graphs. We try to determine its domination and resolving domination parameters, which could be applied to resource location and broadcasting-related problems. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
4. Pareto Approximation Empirical Results of Energy-Aware Optimization for Precedence-Constrained Task Scheduling Considering Switching Off Completely Idle Machines.
- Author
-
Castán Rocha, José Antonio, Santiago, Alejandro, García-Ruiz, Alejandro H., Terán-Villanueva, Jesús David, Martínez, Salvador Ibarra, and Treviño Berrones, Mayra Guadalupe
- Subjects
- *
LANGUAGE models , *MULTI-objective optimization , *DIRECTED acyclic graphs , *SUPERCOMPUTERS , *ENERGY consumption - Abstract
Recent advances in cloud computing, large language models, and deep learning have started a race to create massive High-Performance Computing (HPC) centers worldwide. These centers increase in energy consumption proportionally to their computing capabilities; for example, according to the top 500 organization, the HPC centers Frontier, Aurora, and Super Computer Fugaku report energy consumptions of 22,786 kW, 38,698 kW, and 29,899 kW, respectively. Currently, energy-aware scheduling is a topic of interest to many researchers. However, as far as we know, this work is the first approach considering the idle energy consumption by the HPC units and the possibility of turning off unused units entirely, driven by a quantitative objective function. We found that even when turning off unused machines, the objectives of makespan and energy consumption still conflict and, therefore, their multi-objective optimization nature. This work presents empirical results for AGEMOEA, AGEMOEA2, GWASFGA, MOCell, MOMBI, MOMBI2, NSGA2, and SMS-EMOA. The best-performing algorithm is MOCell for the 400 real scheduling problem tests. In contrast, the best-performing algorithm is GWASFGA for a small-instance synthetic testbed. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
5. A Strategic Approach to Metric Management for a Real-time Process Based on a Task Scheduling Algorithm in Distributed Mobile Environments.
- Author
-
Larios Gómez, Mariano, Quintero Flores, Perfecto Malaquías, and Anzures García, Mario
- Subjects
- *
DISTRIBUTED algorithms , *CONSTRUCTION projects , *SUPERCOMPUTERS , *ALGORITHMS , *SCHEDULING - Abstract
This project introduces a metric-driven strategy for enhancing real-time task planning within a mobile distributed system. The core design revolves around a realtime task planning algorithm that exploits consensus mechanisms among multiple nodes. By incorporating innovative real-time scheduling algorithms, validated on a supercomputer, the project establishes frameworks for simulating distributed mobile environments in real-time. This enables the execution of tasks in unpredictable contexts, ensuring adherence to stringent time constraints and empowering decision-making capabilities. The primary objective is to optimize task allocation in mobile distributed systems, including aerial nodes, thereby facilitating seamless data transfer while maintaining data integrity. The project builds upon the Fan task scheduler, integrating compensatory measures for online load distribution and optimization of message routes, effectively reducing communication latency in dynamic mobile environments. The contributions made aim to enhance real-time task allocation across diverse nodes, enabling the transfer of location and search data across networks without any data loss. This approach bears significant importance for the efficient management of tasks within dynamic scenarios. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
6. THE NEW COLD WARRIOR.
- Author
-
LAPOWSKY, ISSIE
- Subjects
TRADE regulation ,NATIONAL security ,WHITE House staff ,JOINT Comprehensive Plan of Action (2015) ,EXTREME ultraviolet lithography ,SUPERCOMPUTERS - Abstract
The article from Wired discusses the efforts of Jake Sullivan, the White House national security adviser, to counter China's technological advancements and secure the United States' position as a technological superpower. Sullivan has been involved in orchestrating controls on semiconductor manufacturing equipment and fostering tech partnerships with countries like India and Vietnam. While his actions have constrained China's advancement, the long-term impact on promoting democratic values in the tech sector remains uncertain. The article highlights the complexities and challenges of navigating global technology competition and the potential lasting effects of Sullivan's initiatives. [Extracted from the article]
- Published
- 2025
7. QUANTUM SIMULATIONS.
- Author
-
Yazgin, Evrim
- Subjects
QUANTUM computers ,QUBITS ,SUPERCOMPUTERS ,QUANTUM computing ,QUANTUM mechanics ,ENGINEERS ,LAPTOP computers - Abstract
The article discusses the basics of quantum computing to give an overview of the future of quantum simulators. Topics include the development of quantum computers for decades, significance of quantum computers to quantum chemists, and the problem in isolating the quantum simulator system. Also mentioned are the drawback of quantum simulators and comparison between analogue quantum simulators and digital quantum computers.
- Published
- 2024
8. Toward an extreme-scale electronic structure system.
- Author
-
Galvez Vallejo, Jorge L., Snowdon, Calum, Stocks, Ryan, Kazemian, Fazeleh, Yan Yu, Fiona Chuo, Seidl, Christopher, Seeger, Zoe, Alkan, Melisa, Poole, David, Westheimer, Bryce M., Basha, Mehaboob, De La Pierre, Marco, Rendell, Alistair, Izgorodina, Ekaterina I., Gordon, Mark S., and Barca, Giuseppe M. J.
- Subjects
- *
ATOMIC structure , *ELECTRONIC systems , *SUPERCOMPUTERS , *MATERIALS science , *DRUG discovery , *QUANTUM chemistry , *ELECTRONIC structure - Abstract
Electronic structure calculations have the potential to predict key matter transformations for applications of strategic technological importance, from drug discovery to material science and catalysis. However, a predictive physicochemical characterization of these processes often requires accurate quantum chemical modeling of complex molecular systems with hundreds to thousands of atoms. Due to the computationally demanding nature of electronic structure calculations and the complexity of modern high-performance computing hardware, quantum chemistry software has historically failed to operate at such large molecular scales with accuracy and speed that are useful in practice. In this paper, novel algorithms and software are presented that enable extreme-scale quantum chemistry capabilities with particular emphasis on exascale calculations. This includes the development and application of the multi-Graphics Processing Unit (GPU) library LibCChem 2.0 as part of the General Atomic and Molecular Electronic Structure System package and of the standalone Extreme-scale Electronic Structure System (EXESS), designed from the ground up for scaling on thousands of GPUs to perform high-performance accurate quantum chemistry calculations at unprecedented speed and molecular scales. Among various results, we report that the EXESS implementation enables Hartree–Fock/cc-pVDZ plus RI-MP2/cc-pVDZ/cc-pVDZ-RIFIT calculations on an ionic liquid system with 623 016 electrons and 146 592 atoms in less than 45 min using 27 600 GPUs on the Summit supercomputer with a 94.6% parallel efficiency. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
9. Distributed memory, GPU accelerated Fock construction for hybrid, Gaussian basis density functional theory.
- Author
-
Williams-Young, David B., Asadchev, Andrey, Popovici, Doru Thom, Clark, David, Waldrop, Jonathan, Windus, Theresa L., Valeev, Edward F., and de Jong, Wibe A.
- Subjects
- *
DENSITY functional theory , *GRAPHICS processing units , *ATOMIC orbitals , *SUPERCOMPUTERS , *DISTRIBUTED algorithms , *ELECTRONIC structure - Abstract
With the growing reliance of modern supercomputers on accelerator-based architecture such a graphics processing units (GPUs), the development and optimization of electronic structure methods to exploit these massively parallel resources has become a recent priority. While significant strides have been made in the development GPU accelerated, distributed memory algorithms for many modern electronic structure methods, the primary focus of GPU development for Gaussian basis atomic orbital methods has been for shared memory systems with only a handful of examples pursing massive parallelism. In the present work, we present a set of distributed memory algorithms for the evaluation of the Coulomb and exact exchange matrices for hybrid Kohn–Sham DFT with Gaussian basis sets via direct density-fitted (DF-J-Engine) and seminumerical (sn-K) methods, respectively. The absolute performance and strong scalability of the developed methods are demonstrated on systems ranging from a few hundred to over one thousand atoms using up to 128 NVIDIA A100 GPUs on the Perlmutter supercomputer. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
10. Refining HPCToolkit for application performance analysis at exascale.
- Author
-
Adhianto, Laksono, Anderson, Jonathon, Barnett, Robert Matthew, Grbic, Dragana, Indic, Vladimir, Krentel, Mark, Liu, Yumeng, Milaković, Srđan, Phan, Wileam, and Mellor-Crummey, John
- Subjects
- *
GRAPHICS processing units , *GRAPHICAL user interfaces , *SOURCE code , *OPEN scholarship , *INFORMATION resources - Abstract
As part of the US Department of Energy's Exascale Computing Project (ECP), Rice University has been refining its HPCToolkit performance tools to better support measurement and analysis of applications executing on exascale supercomputers. To efficiently collect performance measurements of GPU-accelerated applications, HPCToolkit employs novel non-blocking data structures to communicate performance measurements between tool threads and application threads. To attribute performance information in detail to source lines, loop nests, and inlined call chains, HPCToolkit performs parallel analysis of large CPU and GPU binaries involved in the execution of an exascale application to rapidly recover mappings between machine instructions and source code. To analyze terabytes of performance measurements gathered during executions at exascale, HPCToolkit employs distributed-memory parallelism, multithreading, sparse data structures, and out-of-core streaming analysis algorithms. To support interactive exploration of profiles up to terabytes in size, HPCToolkit's hpcviewer graphical user interface uses out-of-core methods to visualize performance data. The result of these efforts is that HPCToolkit now supports collection, analysis, and presentation of profiles and traces of GPU-accelerated applications at exascale. These improvements have enabled HPCToolkit to efficiently measure, analyze and explore terabytes of performance data for executions using as many as 64K MPI ranks and 64K GPU tiles on ORNL's Frontier supercomputer. HPCToolkit's support for measurement and analysis of GPU-accelerated applications has been employed to study a collection of open-science applications developed as part of ECP. This paper reports on these experiences, which provided insight into opportunities for tuning applications, strengths and weaknesses of HPCToolkit itself, as well as unexpected behaviors in executions at exascale. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
11. Benchmarking Quantum Computational Advantages on Supercomputers.
- Author
-
Wu, Junjie and Liu, Yong
- Subjects
QUANTUM computing ,SUPERCOMPUTERS ,BOSONS ,PROBLEM solving ,STATISTICAL sampling ,QUANTUM computers - Abstract
The achievement of quantum computational advantage, also known as quantum supremacy, is a major milestone at which a quantum computer can solve a problem significantly faster than the world's most powerful classical computers. Two tasks, boson sampling and random quantum circuit sampling, have experimentally exhibited quantum advantages on photonic and superconducting platforms respectively. Classical benchmarking is essential, yet challenging, because these tasks are intractable for classical computers. This study reviews models, algorithms and large‐scale simulations of these two sampling tasks. These approaches continue to hold substantial significance for research in both current noisy intermediate‐scale quantum (NISQ) systems and future fault‐tolerant quantum computing. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
12. Architecture and performance of Perlmutter's 35 PB ClusterStor E1000 all‐flash file system.
- Author
-
Lockwood, Glenn K., Chiusole, Alberto, Gerhardt, Lisa, Lozinskiy, Kirill, Paul, David, and Wright, Nicholas J.
- Subjects
PATTERNMAKING ,BANDWIDTHS ,SUPERCOMPUTERS - Abstract
Summary: NERSC's newest system, Perlmutter, features a 35 PB all‐flash Lustre file system built on HPE Cray ClusterStor E1000. We present its architecture, early performance figures, and performance considerations unique to this architecture. We demonstrate the performance of E1000 OSSes through low‐level Lustre tests that achieve over 90% of the theoretical bandwidth of the SSDs at the OST and LNet levels. We also show end‐to‐end performance for both traditional dimensions of I/O performance (peak bulk‐synchronous bandwidth) and nonoptimal workloads endemic to production computing (small, incoherent I/Os at random offsets) and compare them to NERSC's previous system, Cori, to illustrate that Perlmutter achieves the performance of a burst buffer and the resilience of a scratch file system. Finally, we discuss performance considerations unique to all‐flash Lustre and present ways in which users and HPC facilities can adjust their I/O patterns and operations to make optimal use of such architectures. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
13. Supercomputer framework for reverse engineering firing patterns of neuron populations to identify their synaptic inputs.
- Author
-
Chardon, Matthieu K., Wang, Y. Curtis, Garcia, Marta, Besler, Emre, Beauchamp, J. Andrew, D'Mello, Michael, Powers, Randall K., and Heckman, Charles J.
- Subjects
- *
REVERSE engineering , *FIRE protection engineering , *SUPERCOMPUTERS , *NEUROMODULATION , *NEURONS - Abstract
In this study, we develop new reverse engineering (RE) techniques to identify the organization of the synaptic inputs generating firing patterns of populations of neurons. We tested these techniques in silico to allow rigorous evaluation of their effectiveness, using remarkably extensive parameter searches enabled by massively-parallel computation on supercomputers. We chose spinal motoneurons as our target neural system, since motoneurons process all motor commands and have well-established input-output properties. One set of simulated motoneurons was driven by 300,000+ simulated combinations of excitatory, inhibitory, and neuromodulatory inputs. Our goal was to determine if these firing patterns had sufficient information to allow RE identification of the input combinations. Like other neural systems, the motoneuron input-output system is likely non-unique. This non-uniqueness could potentially limit this RE approach, as many input combinations can produce similar outputs. However, our simulations revealed that firing patterns contained sufficient information to sharply restrict the solution space. Thus, our RE approach successfully generated estimates of the actual simulated patterns of excitation, inhibition, and neuromodulation, with variances accounted for ranging from 75-90%. It was striking that nonlinearities induced in firing patterns by the neuromodulation inputs did not impede RE, but instead generated distinctive features in firing patterns that aided RE. These simulations demonstrate the potential of this form of RE analysis. It is likely that the ever-increasing capacity of supercomputers will allow increasingly accurate RE of neuron inputs from their firing patterns from many neural systems. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
14. Implementation and testing of a KNS topology in an InfiniBand cluster.
- Author
-
Gomez-Lopez, Gabriel, Escudero-Sahuquillo, Jesus, Garcia, Pedro J., and Quiles, Francisco J.
- Subjects
- *
TOPOLOGY , *SUPERCOMPUTERS , *SERVER farms (Computer network management) , *COMPUTER software , *ENGINES - Abstract
The InfiniBand (IB) interconnection technology is widely used in the networks of modern supercomputers and data centers. Among other advantages, the IB-based network devices allow for building multiple network topologies, and the IB control software (subnet manager) supports several routing engines suitable for the most common topologies. However, the implementation of some novel topologies in IB-based networks may be difficult if suitable routing algorithms are not supported, or if the IB switch or NIC architectures are not directly applicable for that topology. This work describes the implementation of the network topology known as KNS in a real HPC cluster using an IB network. As far as we know, this is the first implementation of this topology in an IB-based system. In more detail, we have implemented the KNS routing algorithm in the OpenSM software distribution of the subnet manager, and we have adapted the available IB-based switches to the particular structure of this topology. We have evaluated the correctness of our implementation through experiments in the real cluster, using well-known benchmarks. The obtained results, which match the expected performance for the KNS topology, show that this topology can be implemented in IB-based clusters as an alternative to other interconnection patterns. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
15. Design and performance evaluation of UCX for the Tofu Interconnect D on Fugaku towards efficient multithreaded communication.
- Author
-
Watanabe, Yutaka, Tsuji, Miwako, Murai, Hitoshi, Boku, Taisuke, and Sato, Mitsuhisa
- Subjects
- *
MESSAGE passing (Computer science) , *TOFU , *INTERNATIONAL communication , *SUPERCOMPUTERS , *SYNCHRONIZATION - Abstract
The increasing trend of manycore processors makes multithreaded communication more important to avoid costly global synchronization among cores. One of the representative approaches that require multithreaded communication is the global task-based programming model. In the model, a program is divided into tasks, and tasks are asynchronously executed by each node, and independent thread-to-thread communications are expected. However, the Message passing interface (MPI) based approach is not efficient because of design issues. In this research, we design and implement the utofu transport layer in an abstracted communication library called Unified communication-X (UCX) for efficient remote direct memory access (RDMA) based multithreaded communication on Tofu Interconnect D. The evaluation results on Fugaku show that UCX can significantly improve the multithreaded performance over MPI, while maintaining portability between systems thanks to UCX. UCX shows about 32.8 times lower latency than Fujitsu MPI with 24 threads in the multithreaded pingpong benchmark and about 37.8 times higher update rate than Fujitsu MPI with 24 threads on 256 nodes in multithreaded GUPs benchmark. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
16. High-performance finite elements with MFEM.
- Author
-
Andrej, Julian, Atallah, Nabil, Bäcker, Jan-Phillip, Camier, Jean-Sylvain, Copeland, Dylan, Dobrev, Veselin, Dudouit, Yohann, Duswald, Tobias, Keith, Brendan, Kim, Dohyun, Kolev, Tzanio, Lazarov, Boyan, Mittal, Ketan, Pazner, Will, Petrides, Socratis, Shiraiwa, Syun'ichi, Stowell, Mark, and Tomov, Vladimir
- Subjects
- *
FINITE element method , *COMPUTATIONAL physics , *DISCRETIZATION methods , *C++ , *SUPERCOMPUTERS - Abstract
The MFEM (Modular Finite Element Methods) library is a high-performance C++ library for finite element discretizations. MFEM supports numerous types of finite element methods and is the discretization engine powering many computational physics and engineering applications across a number of domains. This paper describes some of the recent research and development in MFEM, focusing on performance portability across leadership-class supercomputing facilities, including exascale supercomputers, as well as new capabilities and functionality, enabling a wider range of applications. Much of this work was undertaken as part of the Department of Energy's Exascale Computing Project (ECP) in collaboration with the Center for Efficient Exascale Discretizations (CEED). [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
17. Adapting arepo-rt for exascale computing: GPU acceleration and efficient communication.
- Author
-
Zier, Oliver, Kannan, Rahul, Smith, Aaron, Vogelsberger, Mark, and Verbeek, Erkin
- Subjects
- *
RADIATIVE transfer , *AGE of stars , *MIDDLE Ages , *SUPERCOMPUTERS , *COMMUNICATION strategies , *GRAPHICS processing units - Abstract
Radiative transfer (RT) is a crucial ingredient for self-consistent modelling of numerous astrophysical phenomena across cosmic history. However, on-the-fly integration into radiation hydrodynamics (RHD) simulations is computationally demanding, particularly due to the stringent time-stepping conditions and increased dimensionality inherent in multifrequency collisionless Boltzmann physics. The emergence of exascale supercomputers, equipped with extensive CPU cores and GPU accelerators, offers new opportunities for enhancing RHD simulations. We present the first steps towards optimizing arepo - rt for such high-performance computing environments. We implement a novel node-to-node (n-to-n) communication strategy that utilizes shared memory to substitute intranode communication with direct memory access. Furthermore, combining multiple internode messages into a single message substantially enhances network bandwidth utilization and performance for large-scale simulations on modern supercomputers. The single-message n-to-n approach also improves performance on smaller scale machines with less optimized networks. Furthermore, by transitioning all RT-related calculations to GPUs, we achieve a significant computational speedup of around 15 for standard benchmarks compared to the original CPU implementation. As a case study, we perform cosmological RHD simulations of the Epoch of Reionization, employing a similar setup as the THESAN project. In this context, RT becomes sub-dominant such that even without modifying the core arepo codebase, there is an overall threefold improvement in efficiency. The advancements presented here have broad implications, potentially transforming the complexity and scalability of future simulations for a wide variety of astrophysical studies. Our work serves as a blueprint for porting similar simulation codes based on unstructured resolution elements to GPU-centric architectures. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
18. An introduction to quantum computing for statisticians and data scientists.
- Author
-
Lopatnikova, Anna, Tran, Minh-Ngoc, and Sisson, Scott A.
- Subjects
QUANTUM computing ,SUPERCOMPUTERS ,MACHINE learning ,DESCRIPTIVE statistics ,LINEAR algebra - Abstract
Quantum computers promise to surpass the most powerful classical supercomputers in tackling critical practical problems, such as designing pharmaceuticals and fertilizers, optimizing supply chains and traffic, and enhancing machine learning. Since quantum computers operate fundamentally differently from classical ones, their emergence will give rise to a new evolutionary branch of statistical and data analytics methodologies. This review aims to provide an introduction to quantum computing accessible to statisticians and data scientists, equipping them with a comprehensive framework, the basic language, and building blocks of quantum algorithms, as well as an overview of existing quantum applications in statistics and data analysis. Our objective is to empower statisticians and data scientists to follow quantum computing literature relevant to their fields, collaborate with quantum algorithm designers, and ultimately drive the development of the next generation of statistical and data analytics tools. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
19. Nucleosynthesis in Type Ia Supernovae, Classical Novae, and Type I X-Ray Bursts. A Primer on Stellar Explosions.
- Author
-
José, Jordi
- Subjects
- *
NUCLEOSYNTHESIS , *TYPE I supernovae , *X-ray bursts , *COSMOCHEMISTRY , *SUPERCOMPUTERS - Abstract
Nuclear astrophysics aims at unraveling the cosmic origins of chemical elements and the physical processes powering stars. It constitutes a truly multidisciplinary field, that integrates tools, advancements, and accomplishments from theoretical astrophysics, observational astronomy, cosmochemistry, and theoretical and experimental atomic and nuclear physics. For instance, the advent of high-energy astrophysics, facilitated by space-borne observatories, has ushered in a new era, offering a unique, panchromatic view of the universe (i.e., allowing multifrequency observations of stellar events); supercomputers are also playing a pivotal role, furnishing astrophysicists with computational capabilities essential for studying the intricate evolution of stars within a multidimensional framework; cosmochemists, through examination of primitive meteorites, are uncovering tiny fragments of stardust, shedding light on the physical processes operating in stars and on the mechanisms that govern condensation of stellar ejecta into solids; simultaneously, nuclear physicists managed to measure nuclear reactions at (or close to) stellar energies, using both stable and radioactive ion beam facilities. This paper provides a multidisciplinary view on nucleosynthesis accompanying stellar explosions, with a specific focus on thermonuclear supernovae, classical novae, and type I X-ray bursts. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
20. Supercomputers help to optimise track maintenance
- Author
-
Kono, Akiko
- Subjects
Numerical analysis ,Multiprocessing ,Railroads ,Supercomputers ,Simulation methods ,Supercomputer ,Multiprocessing ,Business ,Transportation industry - Abstract
Japan's Railway Technical Research Institute (RTRI) is using supercomputer-driven numerical analysis to optimise track maintenance. Dr Akiko Kono* presents the findings from numerical simulation replicating ballast tamping. FOR over a [...]
- Published
- 2024
21. Computing comes to life.
- Author
-
Gent, Edd
- Subjects
- *
ARTIFICIAL intelligence , *SUPERCOMPUTERS , *COMPUTER engineering , *BACTERIAL colonies , *WEARABLE technology , *BIOLOGICAL systems - Abstract
This CPU was inserted into a cell where it regulated the activity of different genes in response to specially designed sequences of RNA, a form of genetic material, letting the researchers prompt the cell to implement logic gates akin to those in silicon computers. Believe it or not, the bacteria contain more circuits and more processing power That is perhaps not so surprising when you consider that all life computes: from individual cells responding to chemical signals to complex organisms navigating their environment, information processing is central to living systems. The goal is the inverse of artificial intelligence: rather than making computers more brain-like, they will attempt to make brain cells more computer-like. In the past 20 years, armed with new and more powerful tools to engineer cells and molecules, researchers have finally begun to demonstrate the potential of using biological material to build computers that actually work. [Extracted from the article]
- Published
- 2023
- Full Text
- View/download PDF
22. Qinghai University to Host the ASC 2025 Finals
- Subjects
Supercomputers ,Supercomputer ,Arts and entertainment industries - Abstract
The 2025 ASC Student Supercomputer Challenge (ASC25) launch event was held recently in Beijing. The gathering brought together leading academicians, HPC and AI experts, and student-teacher representatives, highlighting the growing [...]
- Published
- 2025
23. HPE may have been beaten Supermicro and Dell to win a $1bn AI contract, but it's not for the Colossus supercomputer
- Author
-
Williams, Wayne
- Subjects
Supercomputers ,Social networks ,Supercomputer ,Contract agreement ,Science and technology - Abstract
Byline: Wayne Williams HPE beat Supermicro and Dell to win a $1bn contract to supply Elon Musk's X with AI servers * Bloomberg reports the contact is for Elon Musk's [...]
- Published
- 2025
24. Nvidia’s tiny $3k AI mini PC is a glimpse of what’s next for Windows PCs
- Subjects
Supercomputers ,Supercomputer ,Computers - Abstract
When I first saw that photo ofhttps://www.google.com/search?q=project+digits&rlz=1C1RXQR_enUS1021US1021&oq=project+digits&gs_lcrp=EgZjaHJvbWUyDwgAEEUYORiDARixAxiABDIQCAEQABiDARixAxiABBiKBTIGCAIQABgDMgYIAxAAGAMyBggEEAAYAzIGCAUQABgDMg0IBhAAGIMBGLEDGIAEMg0IBxAAGIMBGLEDGIAEMgcICBAAGIAEMgcICRAAGIAE0gEIMjE3N2owajeoAgCwAgA&sourceid=chrome&ie=UTF-8 unveiled at CES 2025, I couldn’t help but notice the Apple influence — minimalist, sleek, next to a monitor that looks like Apple’s [...]
- Published
- 2025
25. How Quantum Computing Differs from Classical Computing
- Subjects
Supercomputers ,Computer science ,Algorithms ,Supercomputer ,Algorithm ,Computers - Abstract
Byline: Dharmendra Patel Quantum computing is a new and developing area of computer science that can solve problems that even the most potent classical computers cannot. Quantum hardware and quantum [...]
- Published
- 2025
26. Here's how small Nvidia's $3,000 Digits supercomputer looks in person
- Subjects
Supercomputers ,Supercomputer - Abstract
Byline: Jay Peters One of the biggest announcements in Nvidia CEO Jensen Huang's CES keynote was the small 'Project Digits' AI supercomputer, and if you want to get an idea [...]
- Published
- 2025
27. Nvidia's mini 'desktop supercomputer' is 1,000 times more powerful than a laptop -- and it can fit in your bag
- Subjects
Supercomputers ,Supercomputer ,News, opinion and commentary - Abstract
Scientists have created a new mini PC that is almost as powerful as a supercomputer but can fit in your bag. The new device, dubbed 'Project Digits,' is designed for [...]
- Published
- 2025
28. Accelerating the density-functional tight-binding method using graphical processing units.
- Author
-
Vuong, Van-Quan, Cevallos, Caterina, Hourahine, Ben, Aradi, Bálint, Jakowski, Jacek, Irle, Stephan, and Camacho, Cristopher
- Subjects
- *
LINEAR algebra , *DENSITY matrices , *WATER clusters , *CHEMICAL systems , *CARBON nanotubes , *SUPERCOMPUTERS - Abstract
Acceleration of the density-functional tight-binding (DFTB) method on single and multiple graphical processing units (GPUs) was accomplished using the MAGMA linear algebra library. Two major computational bottlenecks of DFTB ground-state calculations were addressed in our implementation: the Hamiltonian matrix diagonalization and the density matrix construction. The code was implemented and benchmarked on two different computer systems: (1) the SUMMIT IBM Power9 supercomputer at the Oak Ridge National Laboratory Leadership Computing Facility with 1–6 NVIDIA Volta V100 GPUs per computer node and (2) an in-house Intel Xeon computer with 1–2 NVIDIA Tesla P100 GPUs. The performance and parallel scalability were measured for three molecular models of 1-, 2-, and 3-dimensional chemical systems, represented by carbon nanotubes, covalent organic frameworks, and water clusters. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
29. Elon Musk's xAI supercomputer gets 150MW power boost despite concerns over grid impact and local power stability
- Author
-
Udinmwen, Efosa
- Subjects
United States. Tennessee Valley Authority ,Nuclear energy ,Supercomputers ,Supercomputer ,Science and technology - Abstract
Byline: Efosa Udinmwen xAI gains 150MW power approval for its supercomputer, but locals fear impact on grid and future power availability. * Elon Musk's xAI supercomputer gets power boost amid [...]
- Published
- 2024
30. Broadcom Building Three One Million GPU AI Supercomputers for 2027
- Subjects
Supercomputers ,Supercomputer ,Consumer news and advice ,General interest - Abstract
Broadcom predicts serviceable market for AI XPU and networking could reach $60 billion to $90 billion by 2027 and company is a market leader. During (https://edge.media-server.com/mmc/p/v236um4d/) Broadcom's Q4 2024 earnings [...]
- Published
- 2024
31. In 1996, AI beat a grandmaster at chess. In 2024, the stakes are higher
- Author
-
Bonifield, Stevie
- Subjects
International Business Machines Corp. -- Investments ,Chess ,Supercomputers ,Supercomputer ,Company investment ,Science and technology - Abstract
Byline: Stevie Bonifield Who comes out tops when humans and AI go head-to-head? On February 10, 1996, chess grandmaster Garry Kasparov played against Deep Blue, an IBM supercomputer, in the [...]
- Published
- 2024
32. Hewlett Packard Enterprise secures EUR250m deal for advanced supercomputer in Germany
- Subjects
Hewlett Packard Enterprise Co. ,High technology industry ,Supercomputers ,Supercomputer ,Business ,Computers and office automation industries - Abstract
WORLDWIDE COMPUTER PRODUCTS NEWS-December 13, 2024-Hewlett Packard Enterprise secures EUR250m deal for advanced supercomputer in Germany (C)1995-2024 M2 COMMUNICATIONS http://www.m2.co.uk Technology company Hewlett Packard Enterprise (NYSE: HPE) announced on Friday [...]
- Published
- 2024
33. Will the world's fastest supercomputer please stand up?
- Subjects
Supercomputers ,Supercomputer ,Consumer news and advice ,General interest - Abstract
Oracle and xAI love to flex the size of their GPU clusters. It's getting hard to tell who has the most supercomputing power as more firms claim the top spot. [...]
- Published
- 2024
34. Wafer Scale Engines For AI Efficiency
- Subjects
Taiwan Semiconductor Manufacturing Company Ltd. ,Transistors ,Integrated circuits ,Semiconductor chips ,Supercomputers ,Supercomputer ,Standard IC ,Electronics - Abstract
Byline: EFY Bureau Today's most advanced computer chips are a mere few dozen nanometers in size. While powerful chips, including those from NVIDIA and TSMC, continue down the miniaturization path, [...]
- Published
- 2024
35. Cerebras Reports New World Mark in Molecular Dynamics
- Subjects
Molecular dynamics ,Supercomputers ,Supercomputer ,Arts and entertainment industries - Abstract
Cerebras Systems, a company focusing on generative AI, in collaboration with researchers from Sandia, Lawrence Livermore, and Los Alamos National Laboratories, on November 18 reported they have set another world [...]
- Published
- 2024
36. Silicon and supercomputers will define the next AI era. AWS just made a big bet on both
- Subjects
Supercomputers ,Silicon ,Supercomputer ,Consumer news and advice ,General interest - Abstract
AWS unveiled a new AI chip and a supercomputer at its Re: Invent conference on Tuesday. It's a sign that Amazon is ready to reduce its reliance on Nvidia for [...]
- Published
- 2024
37. ASC25 Student Supercomputer Challenge Previews Upcoming Event
- Subjects
College students ,Supercomputers ,Supercomputer ,Arts and entertainment industries - Abstract
The ASC Student Supercomputer Challenge 2025 (ASC25) kicked off its inaugural briefing session at SC24, unveiling a timeline for the upcoming competition. Aspiring undergraduate teams from around the world are [...]
- Published
- 2024
38. THE AI EVOLUTION.
- Subjects
- *
ARTIFICIAL neural networks , *GENERATIVE artificial intelligence , *VACUUM tubes , *SONG lyrics , *CHATGPT , *SUPERCOMPUTERS - Abstract
This article from National Geographic explores the evolution of AI systems and their potential future developments. It highlights the advancements in artificial neural networks, from early prototypes like the SNARC in 1951 to modern networks with billions of connections. The article also discusses the power of supercomputers, such as the recently installed El Capitan, which can perform two quintillion calculations per second. Additionally, it touches on the unexpected possibilities of AI, including its potential to redefine artistic limits and offer new styles of creativity and entertainment. However, the article emphasizes that the inner workings of AI programs are not fully understood, leaving room for further experimentation and acknowledging that AI has strengths and weaknesses. [Extracted from the article]
- Published
- 2024
39. COMPUTING’S CLIMATE COSTS.
- Author
-
Hulick, Kathryn
- Subjects
ARTIFICIAL intelligence ,LANGUAGE models ,GRAPHICS processing units ,STABLE Diffusion ,COMPUTER engineering ,SERVER farms (Computer network management) ,SUPERCOMPUTERS - Abstract
The article explores the environmental impact of artificial intelligence (AI) and the growing energy and water consumption of data centers that support AI technologies. The use of large language models (LLMs) in AI, such as ChatGPT, has led to increased energy consumption and greenhouse gas emissions. Researchers are working on developing more sustainable AI models and finding ways to reduce the carbon footprint of data centers. Strategies include optimizing AI models for specific tasks, implementing power-capping techniques, and exploring alternative approaches to AI that require less energy and data. The article emphasizes the need for a more sustainable and environmentally conscious approach to AI development. [Extracted from the article]
- Published
- 2024
40. Cloud om de hoek.
- Author
-
DIJK, PANCRAS
- Subjects
RESEARCH parks ,DATA mapping ,KICKING (Football) ,SERVER farms (Computer network management) ,SUPERCOMPUTERS - Abstract
This article provides an overview of the significance and growth of data centers in the Netherlands. It emphasizes the role of data centers as the foundation of our digital world and highlights the country's favorable conditions for hosting these facilities. The article acknowledges the challenges of limited space and power supply, as well as the environmental impact of data centers. It also emphasizes the importance of data centers in supporting various industries and the need for constant operational reliability. The article concludes by mentioning efforts to regulate the construction of new data centers due to concerns about housing shortages and power scarcity. [Extracted from the article]
- Published
- 2024
41. Assessment of Metal Foil Pump Configurations for EU-DEMO.
- Author
-
Luo, Xueli, Kathage, Yannick, Teichmann, Tim, Hanke, Stefan, Giegerich, Thomas, and Day, Christian
- Subjects
- *
METAL foils , *HYDROGEN isotopes , *FUEL cycle , *SUPERCOMPUTERS , *HELIUM - Abstract
It is a challenging but key task to reduce the tritium inventory in EU-DEMO to levels that are acceptable for a nuclear regulator. As solution to this issue, a smart fuel cycle architecture is proposed based on the concept of Direct Internal Recycling (DIR), in which the Metal Foil Pump (MFP) will play an important role to separate the unburnt hydrogen isotopes coming from the divertor by exploiting the superpermeation phenomenon. In this study, we will present the assessment of the performance of the lower port of EU-DEMO after the integration of the MFP. For the first time, a thorough comparison of three different MFP (parallel long tubes, sandwich and halo) designs is performed regarding conductance for helium molecules, the pumping speed and the separation factor for deuterium molecules under different physical and geometric parameters. All simulations were carried out in supercomputer Marconi-Fusion with our in-house Test Particle Monte Carlo (TPMC) simulation code ProVac3D because the code had been parallelized with high efficiency. These results are essential for the development of a suitable MFP design in the vacuum-pumping train of EU-DEMO. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
42. Exploring potential therapeutic combinations for castration-sensitive prostate cancer using supercomputers: a proof of concept study.
- Author
-
Tomić, Draško, Murgić, Jure, Fröbe, Ana, Skala, Karolj, Vrljičak, Antonela, Medved Rogina, Branka, Kolarek, Branimir, and Bojović, Viktor
- Subjects
- *
PROSTATE cancer , *PROOF of concept , *DRUG efficacy , *BANKING industry , *SMALL molecules , *ANDROGEN receptors , *SUPERCOMPUTERS - Abstract
To address the challenge of finding new combination therapies against castration-sensitive prostate cancer, we introduce Vini, a computational tool that predicts the efficacy of drug combinations at the intracellular level by integrating data from the KEGG, DrugBank, Pubchem, Protein Data Bank, Uniprot, NCI-60 and COSMIC databases. Vini is a computational tool that predicts the efficacy of drugs and their combinations at the intracellular level. It addresses the problem comprehensively by considering all known target genes, proteins and small molecules and their mutual interactions involved in the onset and development of cancer. The results obtained point to new, previously unexplored combination therapies that could theoretically be promising candidates for the treatment of castration-sensitive prostate cancer and could prevent the inevitable progression of the cancer to the incurable castration-resistant stage. Furthermore, after analyzing the obtained triple combinations of drugs and their targets, the most common targets became clear: ALK, BCL-2, mTOR, DNA and androgen axis. These results may help to define future therapies against castration-sensitive prostate cancer. The use of the Vini computer model to explore therapeutic combinations represents an innovative approach in the search for effective treatments for castration-sensitive prostate cancer, which, if clinically validated, could potentially lead to new breakthrough therapies. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
43. Simulating stellar merger using HPX/Kokkos on A64FX on Supercomputer Fugaku.
- Author
-
Diehl, Patrick, Daiß, Gregor, Huck, Kevin, Marcello, Dominic, Shiber, Sagiv, Kaiser, Hartmut, and Pflüger, Dirk
- Subjects
- *
STELLAR mergers , *SUPERCOMPUTERS , *HIGH performance computing , *GRAPHICS processing units , *C++ - Abstract
The increasing availability of machines relying on non-GPU architectures, such as ARM A64FX in high-performance computing, provides a set of interesting challenges to application developers. In addition to requiring code portability across different parallelization schemes, programs targeting these architectures have to be highly adaptable in terms of compute kernel sizes to accommodate different execution characteristics for various heterogeneous workloads. In this paper, we demonstrate an approach to write compute kernels using Kokko's abstraction layer to be executed on x86 and A64FX CPUs and NVIDIA GPUs. In addition to applying Kokkos as an abstraction over the execution of compute kernels on different heterogeneous execution environments, we show that the use of standard C++ constructs, as exposed by the HPX runtime system, enables platform portability based on the real-world Octo-Tiger astrophysics application. We report our experience with porting Octo-Tiger to the ARM A64FX architecture provided by Stony Brook's Ookami and Riken's Supercomputer Fugaku and compare the resulting performance with that achieved on well-established GPU-oriented HPC machines such as ORNL's Summit, NERSC's Perlmutter, and CSCS's Piz Daint systems. Octo-Tiger scaled well on Supercomputer Fugaku without any major code changes due to the abstraction levels provided by HPX and Kokkos. Adding vectorization support for ARM's SVE to Octo-Tiger was trivial thanks to using standard C++ interfaces. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
44. h5bench : A unified benchmark suite for evaluating HDF5 I/O performance on pre‐exascale platforms.
- Author
-
Bez, Jean Luca, Tang, Houjun, Breitenfeld, Scot, Zheng, Huihuo, Liao, Wei‐Keng, Hou, Kaiyuan, Huang, Zanhua, and Byna, Suren
- Subjects
SUPERCOMPUTERS ,HIGH performance computing ,CUBES ,SCARCITY - Abstract
Summary: Parallel I/O is a critical technique for moving data between compute and storage subsystems of supercomputers. With massive amounts of data produced or consumed by compute nodes, high‐performant parallel I/O is essential. I/O benchmarks play an important role in this process; however, there is a scarcity of I/O benchmarks representative of current workloads on HPC systems. Toward creating representative I/O kernels from real‐world applications, we have created h5bench , a set of I/O kernels that exercise hierarchical data format version 5 (HDF5) I/O on parallel file systems in numerous dimensions. Our focus on HDF5 is due to the parallel I/O library's heavy usage in various scientific applications running on supercomputing systems. The various tests benchmarked in the h5bench suite include I/O operations (read and write), data locality (arrays of basic data types and arrays of structures), array dimensionality (one‐dimensional arrays, two‐dimensional meshes, three‐dimensional cubes), I/O modes (synchronous and asynchronous). In this paper, we present the observed performance of h5bench executed along several of these dimensions on existing supercomputers (Cori and Summit) and pre‐exascale platforms (Perlmutter, Theta, and Polaris). h5bench measurements can be used to identify performance bottlenecks and their root causes and evaluate I/O optimizations. As the I/O patterns of h5bench are diverse and capture the I/O behaviors of various HPC applications, this study will be helpful to the broader supercomputing and I/O community. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
45. The quest for a practical quantum advantage or the importance of applications for quantum computing.
- Author
-
Lorenz, Jeanette Miriam
- Subjects
- *
QUANTUM computing , *TECHNOLOGICAL innovations , *MACHINE learning , *PROBLEM solving , *SUPERCOMPUTERS - Abstract
Quantum computing as emerging technology raises high expectations to accelerate complicated calculations, superseding capabilities of presently available supercomputers and to provide a solution to the ever-increasing challenge of processing big amount of data. Besides theoretical promises, quantum computers have successfully been realized in experimental settings during the past few years, and can now also be accessed commercially. Given this availability of quantum computers, their application potential can be scrutinized. Promised to lead to disruptive changes in addressing and solving problems from the domains of simulation, optimization and machine learning, quantum computers have been researched in context of many application scenarios in industrial and academic setting in the recent years. However, it turns out that currently available quantum computers are still to small in size and limited in quality to fulfill the high hopes set in them at the moment. This review reflects on the current state of the technology coming from the perspective of the application, and indicates the developments required on hardware, software and algorithmic side to eventually make quantum computing beneficial for applications. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
46. Whispering gallery mode sensing through the lens of quantum optics, artificial intelligence, and nanoscale catalysis.
- Author
-
Zossimova, Ekaterina, Jones, Callum, Perera, Kulathunga Mudalige Kalani, Pedireddy, Srikanth, Walter, Michael, and Vollmer, Frank
- Subjects
- *
WHISPERING gallery modes , *ARTIFICIAL intelligence , *MOLECULAR polarizability , *LIGHT sources , *SINGLE molecules , *QUANTUM optics , *QUANTUM computers , *SUPERCOMPUTERS , *MOLECULAR spectroscopy - Abstract
Ultra-sensitive sensors based on the resonant properties of whispering gallery modes (WGMs) can detect fractional changes in nanoscale environments down to the length and time scales of single molecules. However, it is challenging to isolate single-molecule signals from competing noise sources in experiments, such as thermal and mechanical sources of noise, and—at the most fundamental level—the shot noise limit of classical light. Additionally, in contrast to traditional bulk refractive index measurements, analyzing single-molecule signals is complicated by the localized nature of their interactions with nanoscale field gradients. This perspective discusses multifaceted solutions to these challenges, including the use of quantum light sources to boost the signal-to-noise ratio in experiments and leveraging the power of supercomputers to predict the electronic response of molecules to WGM optoplasmonic fields. We further discuss the role of machine learning in WGM sensing, including several advanced models that can predict molecular polarizability and solvent effects. These advancements in WGM spectroscopy and computational modeling can help to decipher the molecular mechanics of enzymes, enable studies of catalysis on the nanoscale, and probe the quantum nature of molecules. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
47. Optimization Research of Heterogeneous 2D-Parallel Lattice Boltzmann Method Based on Deep Computing Unit.
- Author
-
Tao, Shunan, Li, Qiang, Zhou, Quan, Han, Zhaobing, and Lu, Lu
- Subjects
LATTICE Boltzmann methods ,CENTRAL processing units ,FLOW simulations ,HETEROGENEOUS computing ,SUPERCOMPUTERS ,PARALLEL algorithms - Abstract
Currently, research on the lattice Boltzmann method mainly focuses on its numerical simulation and applications, and there is an increasing demand for large-scale simulations in practical scenarios. In response to this situation, this study successfully implemented a large-scale heterogeneous parallel algorithm for the lattice Boltzmann method using OpenMP, MPI, Pthread, and OpenCL parallel technologies on the "Dongfang" supercomputer system. The accuracy and effectiveness of this algorithm were verified through the lid-driven cavity flow simulation. The paper focused on optimizing the algorithm in four aspects: Firstly, non-blocking communication was employed to overlap communication and computation, thereby improving parallel efficiency. Secondly, high-speed shared memory was utilized to enhance memory access performance and reduce latency. Thirdly, a balanced computation between the central processing unit and the accelerator was achieved through proper task partitioning and load-balancing strategies. Lastly, memory access efficiency was improved by adjusting the memory layout. Performance testing demonstrated that the optimized algorithm exhibited improved parallel efficiency and scalability, with computational performance that is 4 times greater than before optimization and 20 times that of a 32-core CPU. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
48. Early experiences evaluating the HPE/Cray ecosystem for AMD GPUs.
- Author
-
Melesse Vergara, Verónica G., Budiardja, Reuben D., and Joubert, Wayne
- Subjects
PROGRAMMING languages ,ECOSYSTEMS ,SUPERCOMPUTERS - Abstract
Summary: The Oak Ridge Leadership Computing Facility (OLCF) has a long history of supporting and promoting GPU‐accelerated computing starting with the deployment of the Titan supercomputer in 2021 and continuing with the Summit supercomputer which has a theoretical peak performance of approximately 200 petaflops. Because the majority of Summit's computational power comes from its 27,972 GPUs, users must port their applications to one of the supported programming models in order to make efficient use of the system. To prepare the transition to Frontier, the OLCF's exascale supercomputer, users will need to adapt to an entirely new ecosystem which will include new hardware and software technologies. First, users will need to familiarize themselves with the AMD Radeon GPU architecture. Furthermore, users who have been previously relying on CUDA will need to transition to the Heterogeneous‐Computing Interface for Portability (HIP) or one of the other supported programming models (e.g., OpenMP, OpenACC). In this work, we describe our initial experiences and lessons learned in porting three applications or proxy apps currently running on Summit to the HPE/Cray ecosystem to leverage the compute power from AMD GPUs: minisweep, GenASiS, and Sparkler. Each one is representative of current production workloads utilized at the OLCF, different programming languages, and different programming models. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
49. Predicting accurate batch queue wait times on production supercomputers by combining machine learning techniques.
- Author
-
Brown, Nick, Gibb, Gordon, Belikov, Evgenij, and Nash, Rupert
- Subjects
MACHINE learning ,HIGH performance computing ,SUPERCOMPUTERS ,REGRESSION analysis ,FORECASTING - Abstract
The ability to accurately predict when a job on a supercomputer will leave the queue and start to run is not only beneficial for providing insights to users, but can also help enable non‐traditional HPC workloads that are not necessarily suited to the batch queue style‐approach that is ubiquitous on production HPC machines. However there are numerous challenges in achieving such a prediction with high accuracy, not least because the queue's state can change rapidly and depend upon many factors. In this work, we explore a novel machine learning approach for predicting queue wait times, hypothesising that such a model can capture the complex behavior resulting from the queue policy and other interactions to generate accurate job start times. For ARCHER2 (HPE Cray EX), Cirrus (HPE 8600), and 4‐cabinet (HPE Cray EX) we explore how different machine learning approaches and techniques improve the accuracy of our predictions, comparing against the estimation generated by Slurm. By combining categorization and regression models, we demonstrate that our approach delivers the most accurate predictions across our machines of interest, with the result of this work being the ability to predict job start times within 1 min of the actual start time for around 65% of jobs on ARCHER2 and 4‐cabinet, and 76% of jobs on Cirrus. When compared against what Slurm can deliver, via the backfill plugin, this represents around 3.8 times better accuracy on ARCHER2 and 18 times better for Cirrus. Furthermore our approach can accurately predicting the start time for three quarters of all job within 10 min of the actual start time on ARCHER2 and 4‐cabinet, and for 90% of jobs on Cirrus. Whilst the initial driver of this work was to better facilitate non‐traditional, interactive and urgent, workloads on HPC machines, the insights gained can also be used to provide wider benefits to users, enrich existing batch queue systems, and inform supercomputing center policy also. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
50. Analysis and prediction of performance variability in large-scale computing systems.
- Author
-
Salimi Beni, Majid, Hunold, Sascha, and Cosenza, Biagio
- Subjects
- *
COMPUTER systems , *SUPERCOMPUTERS , *COMMUNICATION patterns , *JOB performance , *RESOURCE management , *TOPOLOGY - Abstract
The development of new exascale supercomputers has dramatically increased the need for fast, high-performance networking technology. Efficient network topologies, such as Dragonfly+, have been introduced to meet the demands of data-intensive applications and to match the massive computing power of GPUs and accelerators. However, these supercomputers still face performance variability mainly caused by the network that affects system and application performance. This study comprehensively analyzes performance variability on a large-scale HPC system with Dragonfly+ network topology, focusing on factors such as communication patterns, message size, job placement locality, MPI collective algorithms, and overall system workload. The study also proposes an easy-to-measure metric for estimating network background traffic generated by other users, which can be used to estimate the performance of our job accurately. The insights gained from this study contribute to improving performance predictability, enhancing job placement policies and MPI algorithm selection, and optimizing resource management strategies in supercomputers. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.