1. Analysis-Driven Design of Parallel Floating-Point Matrix Multiplication for Implementation in Reconfigurable Logic
- Author
-
Queen's University (Kingston, Ont.). Theses (Queen's University (Kingston, Ont.)), Khayyat, Ahmad, Queen's University (Kingston, Ont.). Theses (Queen's University (Kingston, Ont.)), and Khayyat, Ahmad
- Abstract
The objective of this research is to design an efficient and flexible implementation of parallel matrix multiplication for FPGA devices by analyzing the computation and studying its design space. In order to adapt to the FPGA platform, the design employs blocking and parallelization. Blocked matrix multiplication enables processing arbitrarily large matrices using limited memory capacity, and reduces the bandwidth requirements across the device boundaries by reusing available elements. Exploiting the inherent parallelism in the matrix multiplication computation improves the performance and utilizes the available reconfigurable FPGA resources. The design is constructed by identifying the main design decisions and evaluating the alternatives for each one. The considered design decisions include the scheduling of block transfers, the scheduling of arithmetic operations in a block multiplication, the extent to which the parallelism is exploited, determining the block sizes and shapes, and the use of double buffers for storing matrix blocks. The choices offered by each decision are evaluated analytically in terms of their performance and utilization of FPGA resources. Based on this analysis, a detailed, flexible design that accommodates various alternative design choices is described. The design is optimized for matrices of floating-point elements, and for the FPGA target platform. Prior work is analyzed based on the considered design choices in order to identify the similarities and the differences. The proposed design is implemented using the VHDL hardware description language. The implementation is used to verify the correctness of the design and to confirm the analysis of the design decisions. Correctness is verified both by simulation using the ModelSim logic simulator, and in hardware through compiling the implementation using the Altera Quartus II CAD software and testing it on the Altera DE4 board, featuring a Stratix IV EP4SGX530C2 FPGA device. The implementatio, Thesis (Ph.D, Electrical & Computer Engineering) -- Queen's University, 2013-08-03 12:46:13.484
- Published
- 2013