Back to Search Start Over

Preparation and optimization of a diverse workload for a large-scale heterogeneous system

Authors :
Martin Schulz
Ulrike Meier Yang
David F. Richards
Tong Chen
Shiv Sundram
Todd Gamblin
Shelby Lockhart
Phil Regier
David Beckingsale
Ed Zywicz
Ruipeng Li
Giacomo Domeniconi
James C. Sexton
Bob Walkup
Jarom Nelson
Carlos Costa
Hui-Fang Wen
Ramesh Pankajakshan
John A. Gunnels
Xiaohua Zhang
Brian Van Essen
Kathryn M. O'Brien
I-Feng W. Kuo
Johann Dahm
Guillaume Thomas-Collignon
Bert Still
Naoya Maruyama
Jamie A. Bramwell
David Boehme
Kathleen Shoga
Carol S. Woodward
Howard A. Scott
M. P. Katz
Ian Karlin
T Epperly
Tzanio V. Kolev
Eun Kyung Lee
Steven H. Langer
Christopher Ward
David J. Gardner
Sara I. L. Kokkila-Schumacher
Christopher Young
Kevin O'Brien
Barry Chen
Björn Sjögreen
Jose R. Brunheroto
Claudia Misale
Roger Pearce
Guojing Cong
Matthew Legendre
Lu Wang
Jaime H. Moreno
Kathleen McCandless
Cyril Zeller
Rao Nimmakayala
Bronis R. de Supinski
Xinyu Que
Sorin Bastea
Robert D. Falgout
Peng Wang
Charway R. Cooper
Aaron Fisher
Jim Brase
R. Neely
David Appelhans
Alexey Voronin
James N. Glosli
Slaven Peles
Pei-Hung Lin
Tony Degroot
Hai Le
Daniel A. White
Levi Barnes
Steve Rennich
Yoonho Park
Peter D. Barnes
Bob Anderson
Jonathan J. Wong
Robert C. Blake
Source :
SC
Publication Year :
2019
Publisher :
ACM, 2019.

Abstract

Productivity from day one on supercomputers that leverage new technologies requires significant preparation. An institution that procures a novel system architecture often lacks sufficient institutional knowledge and skills to prepare for it. Thus, the "Center of Excellence" (CoE) concept has emerged to prepare for systems such as Summit and Sierra, currently the top two systems in the Top 500. This paper documents CoE experiences that prepared a workload of diverse applications and math libraries for a heterogeneous system. We describe our approach to this preparation, including our management and execution strategies, and detail our experiences with and reasons for using different programming approaches. Our early science and performance results show that the project enabled significant early seismic science with up to a l4X throughput increase over Cori. In addition to our successes, we discuss our challenges and failures so others may benefit from our experience.

Details

Database :
OpenAIRE
Journal :
Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis
Accession number :
edsair.doi...........4e88bb7115e2e0999663b946908ca92b
Full Text :
https://doi.org/10.1145/3295500.3356192