Sparse-YOLO: Hardware/software co-design of an FPGA accelerator for YOLOv2

Z Wang, K Xu, S Wu, L Liu, L Liu, D Wang - IEEE Access, 2020 - ieeexplore.ieee.org
Z Wang, K Xu, S Wu, L Liu, L Liu, D Wang
IEEE Access, 2020ieeexplore.ieee.org
Convolutional neural network (CNN) based object detection algorithms are becoming
dominant in many application fields due to their superior accuracy advantage over
traditional schemes. Among them, You Look Only Once (YOLO) is one of the most popular
detection frameworks that show best trade-offs between speed and accuracy. However, due
to the intrinsic high computational workload of CNN, it is still challenging when targeting
high-throughput processing with low cost in energy consumption. In this paper, we propose …
Convolutional neural network (CNN) based object detection algorithms are becoming dominant in many application fields due to their superior accuracy advantage over traditional schemes. Among them, You Look Only Once (YOLO) is one of the most popular detection frameworks that show best trade-offs between speed and accuracy. However, due to the intrinsic high computational workload of CNN, it is still challenging when targeting high-throughput processing with low cost in energy consumption. In this paper, we propose a hardware/software (HW/SW) co-design methodology targeting CPU+FPGA-based heterogeneous platforms. Firstly, we extend a novel sparse convolution algorithm to the YOLOv2 framework, and then develop a resource-efficient FPGA accelerator architecture based on asynchronously executed parallel convolution cores. Secondly, algorithm-level optimization schemes, including hardware-aware neural network pruning, clustering and quantization are introduced, which successfully save the computational workload of the YOLOv2 algorithm by 7 times. Finally, an end-to-end design space exploration flow for FPGA-based accelerator design is presented and two HW/SW partition strategies are studied and implemented. Experimental results show that our design can achieve a peak throughput of 2.13 TOPS (72.5 fps) on an Intel Arria-10 GX1150 FPGA under the working frequency of 211 MHz, while the detection accuracy is 74.45 on the PASCAL VOC2007 dataset.
ieeexplore.ieee.org
以上显示的是最相近的搜索结果。 查看全部搜索结果