Start Over

G-Meta: Distributed Meta Learning in GPU Clusters for Large-Scale Recommender Systems

Authors :: Xiao, Youshao
Zhao, Shangchun
Zhou, Zhenglei
Huan, Zhaoxin
Ju, Lin
Zhang, Xiaolu
Wang, Lin
Zhou, Jun
Publication Year :: 2024
Abstract: Recently, a new paradigm, meta learning, has been widely applied to Deep Learning Recommendation Models (DLRM) and significantly improves statistical performance, especially in cold-start scenarios. However, the existing systems are not tailored for meta learning based DLRM models and have critical problems regarding efficiency in distributed training in the GPU cluster. It is because the conventional deep learning pipeline is not optimized for two task-specific datasets and two update loops in meta learning. This paper provides a high-performance framework for large-scale training for Optimization-based Meta DLRM models over the \textbf{G}PU cluster, namely \textbf{G}-Meta. Firstly, G-Meta utilizes both data parallelism and model parallelism with careful orchestration regarding computation and communication efficiency, to enable high-speed distributed training. Secondly, it proposes a Meta-IO pipeline for efficient data ingestion to alleviate the I/O bottleneck. Various experimental results show that G-Meta achieves notable training speed without loss of statistical performance. Since early 2022, G-Meta has been deployed in Alipay's core advertising and recommender system, shrinking the continuous delivery of models by four times. It also obtains 6.48\% improvement in Conversion Rate (CVR) and 1.06\% increase in CPM (Cost Per Mille) in Alipay's homepage display advertising, with the benefit of larger training samples and tasks.

Subjects :: Computer Science - Machine Learning
Computer Science - Distributed, Parallel, and Cluster Computing
Computer Science - Information Retrieval

Details

Database :: arXiv
Publication Type :: Report
Accession number :: edsarx.2401.04338
Document Type :: Working Paper
Full Text :: https://doi.org/10.1145/3583780.3615208

Full Text Access

View/download PDF

Tools

Email
Cite

Printer

Authors Abstract Subjects Details

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

G-Meta: Distributed Meta Learning in GPU Clusters for Large-Scale Recommender Systems

Abstract

Subjects

Details

Tools

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

G-Meta: Distributed Meta Learning in GPU Clusters for Large-Scale Recommender Systems

Abstract

Subjects

Details

Tools

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources