Large Matrix-Matrix Multiplication on FPGA

Mohammad

8 years ago

Digital System Design with High-Level Synthesis for FPGA: Combinational Circuits

Goal	Implementing a large matrix-matrix multiplication on FPGA
Approach	Using divide-and-conquer techniques to describe the matrix multiplication algorithm and then using SDSoC for high-level synthesis
Benefits	High-performance implementation, short time-to-market design
Credit	This work has been done under the ENPOWER project (funded by EPSRC) at the University of Bristol.

Matrix multiplication is one of the operators that have a wide range of applications in image processing, scientific computing, simulation, robotics, and so on. Therefore, providing a fast speed implementation using CPU, GPU or FPGA has always been a challenge.

Here, I briefly explain how to implement this operator on FPGA.

As FPGAs have limited resources in terms of internal memory or logics, transferring all the data to an FPGA and then performing the multiplication is not possible. Therefore, the FPGA should collaborate with the main memory to complete the task. However, the low latency of the data access in the main memory is the main bottleneck in this collaboration. To tackle this problem, the approach explained here tries to minimize this collaboration overhead.

The key idea is using the divide-and-conquer technique. The following figure shows this approach by horizontally and vertically dividing matrix A and B, respectively.