Enhancing CNN Deployment Efficiency on Mobile Devices with FPGA-based Accelerators: Memory-based Convolution and Resource Optimization
Penulis/Author
Dr. Agfianto Eko Putra, M.Si. (1); Oskar Natan, S.ST., M.Tr.T., Ph.D. (2); Prof. Dr. Ir. Jazi Eko Istiyanto, M.Sc. (3)
Tanggal/Date
1 2024
Kata Kunci/Keyword
Abstrak/Abstract
The availability of computation resources imposes limitations on the deployment of convolutional neural networks (CNN) models on edge devices, leading to a slow inference speed. To address this issue, a Field-Programmable Gate Arrays (FPGA)-based accelerator is used as a deployment media as it offers distributed arithmetic calculations and efficient memory components. The memory-based convolution simplifies deployment on various FPGAs, eliminating the need for high-end components such as digital signal processor (DSP) units. Furthermore, the number of processing elements aligns with the size of the kernel in the CNN architecture. In this research, we use FPGA to accelerate the inference of a CNN model. To be more specific, we focus on the implementation of 8-bit convolution followed by a ReLU activation function on a Xilinx Artix-7 FPGA board. Based on the experimental results, the statistic of resource utilization indicates the usage of 647 slices of Look-Up Tables (LUT) at 1.02\%, 474 slices of Flip-Flops (FF) at 0.37\%, 8 Input/Output (IO) blocks at 3.81\%, and 2 Block Clock Multiplexers (BUFG) at 6.25\% with a maximum attainable frequency of 68.96 MHz. This shows how FPGA can reduce resource utilization drastically.