Combined spatial and temporal blocking for high-performance stencil computation on FPGAs using OpenCL

HR Zohouri, A Podobas, S Matsuoka - Proceedings of the 2018 ACM …, 2018 - dl.acm.org
Recent developments in High Level Synthesis tools have attracted software programmers to
accelerate their high-performance computing applications on FPGAs. Even though it has …

FBLAS: Streaming linear algebra on FPGA

T De Matteis, J de Fine Licht… - … conference for high …, 2020 - ieeexplore.ieee.org
Spatial computing architectures pose an attractive alternative to mitigate control and data
movement overheads typical of load-store architectures. In practice, these devices are rarely …

Shallow water DG simulations on FPGAs: Design and comparison of a novel code generation pipeline

C Alt, T Kenter, S Faghih-Naini, J Faj… - … Conference on High …, 2023 - Springer
FPGAs are receiving increased attention as a promising architecture for accelerators in HPC
systems. Evolving and maturing development tools based on high-level synthesis promise …

OpenCL-based FPGA design to accelerate the nodal discontinuous Galerkin method for unstructured meshes

T Kenter, G Mahale, S Alhaddad… - 2018 IEEE 26th …, 2018 - ieeexplore.ieee.org
The exploration of FPGAs as accelerators for scientific simulations has so far mostly been
focused on small kernels of methods working on regular data structures, for example in the …

Boyi: A systematic framework for automatically deciding the right execution model of OpenCL applications on FPGAs

J Jiang, Z Wang, X Liu, J Gómez-Luna… - Proceedings of the …, 2020 - dl.acm.org
FPGA vendors provide OpenCL software development kits for easier programmability, with
the goal of replacing the time-consuming and error-prone register-transfer level (RTL) …

The strong scaling advantage of FPGAs in HPC for n-body simulations

J Menzel, C Plessl, T Kenter - ACM Transactions on Reconfigurable …, 2021 - dl.acm.org
N-body methods are one of the essential algorithmic building blocks of high-performance
and parallel computing. Previous research has shown promising performance for …

OpenCL implementation of Cannon's matrix multiplication algorithm on Intel Stratix 10 FPGAs

P Gorlani, T Kenter, C Plessl - 2019 International Conference …, 2019 - ieeexplore.ieee.org
Stratix 10 FPGA cards have a good potential for the acceleration of HPC workloads since the
Stratix 10 product line introduces devices with a large number of DSP and memory blocks …

[PDF][PDF] High performance computing with FPGAs and OpenCL

HR Zohouri - arXiv preprint arXiv:1810.09773, 2018 - t2r2.star.titech.ac.jp
With the impending death of Moore's law, the High Performance Computing (HPC)
community is actively exploring new options to satisfy the never-ending need for faster and …

Algorithm-hardware co-design of a discontinuous Galerkin shallow-water model for a dataflow architecture on FPGA

T Kenter, A Shambhu, S Faghih-Naini… - Proceedings of the …, 2021 - dl.acm.org
We present the first FPGA implementation of the full simulation pipeline of a shallow water
code based on the discontinuous Galerkin method. Using OpenCL and following an …

High-performance spectral element methods on field-programmable gate arrays: implementation, evaluation, and future projection

M Karp, A Podobas, N Jansson, T Kenter… - 2021 IEEE …, 2021 - ieeexplore.ieee.org
Improvements in computer systems have historically relied on two well-known observations:
Moore's law and Dennard's scaling. Today, both these observations are ending, forcing …