Back to Search Start Over

Adventures with Grace Hopper AI Super Chip and the National Research Platform

Authors :
Hurt, J. Alex
Scott, Grant J.
Weitzel, Derek
Zhu, Huijun
Publication Year :
2024

Abstract

The National Science Foundation (NSF) funded National Research Platform (NRP) is a hyper-converged cluster of nationally and globally interconnected heterogeneous computing resources. The dominant computing environment of the NRP is the x86 64 instruction set architecture (ISA), often with graphics processing units (GPUs). Researchers across the nation leverage containers and Kubernetes to execute high-throughput computing (HTC) workloads across the heterogeneous cyberinfrastructure with minimal friction and maximum flexibility. As part of the NSF-funded GP-ENGINE project, we stood up the first server with an NVIDIA Grace Hopper AI Chip (GH200), an alternative ARM ISA, for the NRP. This presents challenges, as containers must be specifically built for ARM versus x86 64. Herein, we describe the challenges encountered, as well as our resulting solutions and some relevant performance benchmarks. We specifically compare the GH200 to A100 for computer vision workloads, within compute nodes in the NRP.

Details

Database :
arXiv
Publication Type :
Report
Accession number :
edsarx.2410.16487
Document Type :
Working Paper