R Cattaneo, G Natale, C Sicignano, D Sciuto… - ACM Transactions on …, 2015 - dl.acm.org
In high-performance systems, stencil computations play a crucial role as they appear in a variety of different fields of application, ranging from partial differential equation solving, to …
FPGAs are well known for their ability to perform non-standard computations not supported by classical microprocessors. Many libraries of highly customizable application-specific IPs …
In this paper we propose a design template for stencil computations targeting FPGA-based accelerators. The goal for our design is to provide scalable high throughput designs that can …
C Alias, A Plesco - Proceedings of the 30th ACM SIGPLAN International …, 2021 - dl.acm.org
With the emergence of reconfigurable FPGA circuits as a credible alternative to GPUs for HPC acceleration, new compilation paradigms are required to map high-level algorithmic …
G Weisz, JC Hoe - Proceedings of the ACM/SIGDA international …, 2013 - dl.acm.org
This paper presents initial work on developing a C compiler for the CoRAM FPGA computing abstraction. The presented effort focuses on compiling fixed-bound perfect loop nests that …
A Amaricai, O Boncalo… - IET Computers & Digital …, 2014 - Wiley Online Library
Floating‐point (FP) multiply‐add fused (F1* F2±F3) and multiply‐accumulate represent the most common arithmetic operation in a wide range of applications, such as graphic …
R Perez-Andrade, C Torres-Huitzil… - Microprocessors and …, 2015 - Elsevier
Matrix algorithms are an important part of many digital signal processing applications as they are core kernels that are usually required to be applied many times while computing …
SS Ganesh, JJJ Nesam… - 2020 First International …, 2020 - ieeexplore.ieee.org
Necessity of multiplication followed by the addition in numerous digital signal processing applications demands Fused Multiply and Add (FMA) unit for computations. This FMA design …
High-Level Synthesis (HLS)[23, 11, 21, 7] consists in compiling a circuit from a high-level program. With HLS, there is no runtime, every scheduling and allocation decision from high …