1. Improving Deep Learning with a customizable GPU-like FPGA-based accelerator
- Author
-
Alessandro Cilardo, Edoardo Fusella, Mirko Gagliardi, Gagliardi, Mirko, Fusella, E., and Cilardo, A.
- Subjects
Computer science ,business.industry ,Deep learning ,020206 networking & telecommunications ,02 engineering and technology ,Parallel computing ,Execution time ,Multithreading ,0202 electrical engineering, electronic engineering, information engineering ,Deep neural networks ,020201 artificial intelligence & image processing ,Artificial intelligence ,SIMD ,business ,Field-programmable gate array - Abstract
An ever increasing number of challenging appli- cations are being approached using Deep Learning, obtaining impressive results in a variety of different domains. However, state-of-the-art accuracy requires deep neural networks with a larger number of layers and a huge number of different filters with millions of weights. GPU- and FPGA-based architectures have been proposed as a possible solution for facing this enormous demand of computing resources. In this paper, we investigate the adoption of different architectural features, i.e., SIMD paradigm, multithreading, and non-coherent on-chip memory for Deep Learning oriented FPGA-based accelerator designs. Experimental results on a Xilinx Virtex-7 FPGA show that the SIMD paradigm and multithreading can lead to an improvement in the execution time up to $5{\times }$and $3 . 5{\times }$, respectively. A further enhancement up to $1 . 75{\times }$can be obtained using a non-coherent on-chip memory.
- Published
- 2018
- Full Text
- View/download PDF