Improving Deep Learning with a customizable GPU-like FPGA-based accelerator

Authors :: Alessandro Cilardo
Edoardo Fusella
Mirko Gagliardi
Gagliardi, Mirko
Fusella, E.
Cilardo, A.
Source :: PRIME
Publication Year :: 2018
Publisher :: IEEE, 2018.
Abstract: An ever increasing number of challenging appli- cations are being approached using Deep Learning, obtaining impressive results in a variety of different domains. However, state-of-the-art accuracy requires deep neural networks with a larger number of layers and a huge number of different filters with millions of weights. GPU- and FPGA-based architectures have been proposed as a possible solution for facing this enormous demand of computing resources. In this paper, we investigate the adoption of different architectural features, i.e., SIMD paradigm, multithreading, and non-coherent on-chip memory for Deep Learning oriented FPGA-based accelerator designs. Experimental results on a Xilinx Virtex-7 FPGA show that the SIMD paradigm and multithreading can lead to an improvement in the execution time up to $5{\times }$and $3 . 5{\times }$, respectively. A further enhancement up to $1 . 75{\times }$can be obtained using a non-coherent on-chip memory.

Subjects :: Computer science
business.industry
Deep learning
020206 networking & telecommunications
02 engineering and technology
Parallel computing
Execution time
Multithreading
0202 electrical engineering, electronic engineering, information engineering
Deep neural networks
020201 artificial intelligence & image processing
Artificial intelligence
SIMD
business
Field-programmable gate array

Database :: OpenAIRE
Journal :: 2018 14th Conference on Ph.D. Research in Microelectronics and Electronics (PRIME)
Accession number :: edsair.doi.dedup.....1aeba6c91378b7dcfc1a6ef5c0b0365c
Full Text :: https://doi.org/10.1109/prime.2018.8430335