FPGA-specific synthesis of loop-nests with pipelined computational cores

V Sklyarov, I Skliarova - Microprocessors and Microsystems, 2014 - Elsevier

The paper is dedicated to fast FPGA-based hardware accelerators that implement sorting
networks. The primary emphasis is on the uniformity of core components, feasible …

被引用次数：65 相关文章所有 4 个版本

[PDF] acm.org

On how to accelerate iterative stencil loops: a scalable streaming-based approach

R Cattaneo, G Natale, C Sicignano, D Sciuto… - ACM Transactions on …, 2015 - dl.acm.org

In high-performance systems, stencil computations play a crucial role as they appear in a
variety of different fields of application, ranging from partial differential equation solving, to …

被引用次数：49 相关文章所有 5 个版本

[PDF] hal.science

Bridging high-level synthesis and application-specific arithmetic: The case study of floating-point summations

Y Uguen, F de Dinechin… - 2017 27th International …, 2017 - ieeexplore.ieee.org

FPGAs are well known for their ability to perform non-standard computations not supported
by classical microprocessors. Many libraries of highly customizable application-specific IPs …

被引用次数：18 相关文章所有 7 个版本

[PDF] hal.science

Towards scalable and efficient FPGA stencil accelerators

G Deest, N Estibals, T Yuki, S Derrien… - IMPACT'16-6th …, 2016 - inria.hal.science

In this paper we propose a design template for stencil computations targeting FPGA-based
accelerators. The goal for our design is to provide scalable high throughput designs that can …

被引用次数：15 相关文章所有 6 个版本

[PDF] hal.science

Data-aware process networks

C Alias, A Plesco - Proceedings of the 30th ACM SIGPLAN International …, 2021 - dl.acm.org

With the emergence of reconfigurable FPGA circuits as a credible alternative to GPUs for
HPC acceleration, new compilation paradigms are required to map high-level algorithmic …

被引用次数：15 相关文章所有 8 个版本

[PDF] cmu.edu

C-to-coram: Compiling perfect loop nests to the portable coram abstraction

G Weisz, JC Hoe - Proceedings of the ACM/SIGDA international …, 2013 - dl.acm.org

This paper presents initial work on developing a C compiler for the CoRAM FPGA computing
abstraction. The presented effort focuses on compiling fixed-bound perfect loop nests that …

被引用次数：14 相关文章所有 9 个版本

[PDF] wiley.com Full View

Low‐precision DSP‐based floating‐point multiply‐add fused for Field Programmable Gate Arrays

A Amaricai, O Boncalo… - IET Computers & Digital …, 2014 - Wiley Online Library

Floating‐point (FP) multiply‐add fused (F1* F2±F3) and multiply‐accumulate represent the
most common arithmetic operation in a wide range of applications, such as graphic …

被引用次数：10 相关文章所有 10 个版本

[PDF] inaoep.mx

Processor arrays generation for matrix algorithms used in embedded platforms implemented on FPGAs

R Perez-Andrade, C Torres-Huitzil… - Microprocessors and …, 2015 - Elsevier

Matrix algorithms are an important part of many digital signal processing applications as
they are core kernels that are usually required to be applied many times while computing …

被引用次数：7 相关文章所有 5 个版本

High speed half-precision floating-point fused multiply and add unit using DSP blocks

SS Ganesh, JJJ Nesam… - 2020 First International …, 2020 - ieeexplore.ieee.org

Necessity of multiplication followed by the addition in numerous digital signal processing
applications demands Fused Multiply and Add (FMA) unit for computations. This FMA design …

被引用次数：3 相关文章

[PDF] ens-lyon.fr

[PDF][PDF] Scalable Trace-based Compile-Time Memory Allocation

PPR CLAUSS - 2024 - perso.ens-lyon.fr

High-Level Synthesis (HLS)[23, 11, 21, 7] consists in compiling a circuit from a high-level
program. With HLS, there is no runtime, every scheduling and allocation decision from high …

高级搜索

QQ 群