Start Over

Low-Precision Hardware Architectures Meet Recommendation Model Inference at Scale

Authors :: Ping Tak Peter Tang
Raghuraman Krishnamoorthi
Daya Shanker Khudia
Satish Nadathur
Jongsoo Park
Hector Yuen
Jianyu Huang
Maxim Naumov
Ellie Wen
Mikhail Smelyanskiy
Xiaohan Wei
Sam Naghshineh
Dhruv Choudhary
Jie Yang
Changkyu San Jose Kim
Haixin Liu
Deng Zhaoxia
Carole-Jean Wu
Source :: IEEE Micro. 41:93-100
Publication Year :: 2021
Publisher :: Institute of Electrical and Electronics Engineers (IEEE), 2021.
Abstract: Tremendous success of machine learning (ML) and the unabated growth in model complexity motivated many ML-specific designs in hardware architectures to speed up the model inference. While these architectures are diverse, highly optimized low-precision arithmetic is a component shared by most. Nevertheless, recommender systems important to Facebook’s personalization services are demanding and complex: They must serve billions of users per month responsively with low latency while maintaining high prediction accuracy. Do these low-precision architectures work well with our production recommendation systems? They do. But not without significant effort. In this article, we share our search strategies to adapt reference recommendation models to low-precision hardware, our optimization of low-precision compute kernels, and the tool chain to maintain our models’ accuracy throughout their lifespan. We believe our lessons from the trenches can promote better codesign between hardware architecture and software engineering, and advance the state of the art of ML in industry.

Subjects :: Hardware architecture
Speedup
business.industry
Computer science
Inference
Recommender system
Personalization
Hardware and Architecture
Component (UML)
State (computer science)
Electrical and Electronic Engineering
Latency (engineering)
business
Software
Computer hardware

Details

ISSN :: 19374143 and 02721732
Volume :: 41
Database :: OpenAIRE
Journal :: IEEE Micro
Accession number :: edsair.doi...........fcd9d56841f7bd8d51b6f6565668bb69

Tools

Email
Cite

Printer

Authors Abstract Subjects Details

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Low-Precision Hardware Architectures Meet Recommendation Model Inference at Scale

Abstract

Subjects

Details

Tools

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Low-Precision Hardware Architectures Meet Recommendation Model Inference at Scale

Abstract

Subjects

Details

Tools

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources