Back to Search Start Over

EALI: Energy-aware layer-level scheduling for convolutional neural network inference services on GPUs.

Authors :
Yao, Chunrong
Liu, Wantao
Liu, Zhibing
Yan, Longchuan
Hu, Songlin
Tang, Weiqing
Source :
Neurocomputing. Oct2022, Vol. 507, p265-281. 17p.
Publication Year :
2022

Abstract

The success of convolutional neural networks (CNNs) has made low-latency inference services on Graphic Processing Units (GPUs) a hot research topic. However, GPUs are hardware processors with high power consumption. To have the least energy consumption while meeting latency Service-Level-Objective (SLO), batching strategy and dynamic voltage frequency scaling (DVFS) are two important solutions. However, existing studies do not coordinate them and regard CNN as a black box, which makes inference services less energy-efficient. In this paper, we propose EALI, an energy-aware layer-level adaptive scheduling framework that is comprised of a power prediction model, a layer combination strategy, and an energy-aware layer-level scheduler. The power prediction model uses classic machine learning techniques to predict fine-grained layer-level power consumption. The layer combination strategy combines multiple layers into optimization units to lower scheduling overhead and complexity. The energy-aware layer-level scheduler adaptively coordinates batching strategy and layer-level DVFS according to workloads to minimize the energy consumption while meeting SLO. Our experimental results on NVIDIA Tesla M40 and V100 GPUs show that, compared to the state-of-the-art approaches, EALI decreases energy consumption by up to 36.24% while meeting SLO. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
09252312
Volume :
507
Database :
Academic Search Index
Journal :
Neurocomputing
Publication Type :
Academic Journal
Accession number :
158748375
Full Text :
https://doi.org/10.1016/j.neucom.2022.08.025