1. Demand MemCpy: Overlapping of Computation and Data Transfer for Heterogeneous Computing
- Author
-
Donghun Jeong, Jihun Park, and Jungrae Kim
- Subjects
Computer architecture ,GP-GPU ,accelerator ,memory synchronization ,Electrical engineering. Electronics. Nuclear engineering ,TK1-9971 - Abstract
Heterogeneous computing relies on collaboration among different types of processors on shared data. In systems with discrete accelerators (e.g., GP-GPU), data sharing requires transferring a large amount of data between CPU and accelerator memories and can significantly increase the end-to-end execution time. This paper proposes a novel mechanism called Demand MemCpy (DMC) to hide the data sharing overheads. DMC copies data from host memory to accelerator memory based on demands at page granularity. It utilizes a hardware-only mechanism to fetch the requested page with a short latency and the background pre-copy to fetch related pages in advance. Our evaluation shows that DMC can reduce the end-to-end execution time of GP-GPU application by 25.4% on average by overlapping computation with data transfer and not transferring unused pages.
- Published
- 2022
- Full Text
- View/download PDF