GSPMD: General and Scalable Parallelization for ML Computation Graphs

Authors :: Xu, Yuanzhong
Lee, HyoukJoong
Chen, Dehao
Hechtman, Blake
Huang, Yanping
Joshi, Rahul
Krikun, Maxim
Lepikhin, Dmitry
Ly, Andy
Maggioni, Marcello
Pang, Ruoming
Shazeer, Noam
Wang, Shibo
Wang, Tao
Wu, Yonghui
Chen, Zhifeng
Xu, Yuanzhong
Lee, HyoukJoong
Chen, Dehao
Hechtman, Blake
Huang, Yanping
Joshi, Rahul
Krikun, Maxim
Lepikhin, Dmitry
Ly, Andy
Maggioni, Marcello
Pang, Ruoming
Shazeer, Noam
Wang, Shibo
Wang, Tao
Wu, Yonghui
Chen, Zhifeng
Publication Year :: 2021
Abstract: We present GSPMD, an automatic, compiler-based parallelization system for common machine learning computations. It allows users to write programs in the same way as for a single device, then give hints through a few annotations on how to distribute tensors, based on which GSPMD will parallelize the computation. Its representation of partitioning is simple yet general, allowing it to express different or mixed paradigms of parallelism on a wide variety of models. GSPMD infers the partitioning for every operator based on limited user annotations, making it convenient to scale existing single-device programs. It solves several technical challenges for production usage, allowing GSPMD to achieve 50% to 62% compute utilization on up to 2048 Cloud TPUv3 cores for models with up to one trillion parameters.

Tools