A high-performance accelerator for floating-point matrix multiplication

X Jia, G Wu, X Xie - … on Parallel and Distributed Processing with …, 2017 - ieeexplore.ieee.org
Matrix multiplication is a widely-used routine in science and engineering applications.
Accelerating this routine is important, because applications with large-scale matrix …

An efficient method of parallel multiplication on a single DSP slice for embedded FPGAs

Z Huang, S Zhang, W Wang - IEEE Access, 2019 - ieeexplore.ieee.org
Field-programmable gate arrays (FPGAs) can efficiently implement custom applications via
their embedded digital signal processor (DSP) slices, including binary multipliers. An …

Optimal dynamic data layouts for 2D FFT on 3D memory integrated FPGA

R Chen, SG Singapura, VK Prasanna - The Journal of Supercomputing, 2017 - Springer
FPGAs have been widely used for accelerating various applications. For many data
intensive applications, the memory bandwidth limits the performance. 3D memories with …

On-chip memory efficient data layout for 2D FFT on 3D memory integrated FPGA

SG Singapura, R Kannan… - 2016 IEEE High …, 2016 - ieeexplore.ieee.org
3D memories are becoming viable solutions for the memory wall problem and meeting the
bandwidth requirements of memory intensive applications. The high bandwidth provided by …