FPGA HLS today: successes, challenges, and opportunities

J Cong, J Lau, G Liu, S Neuendorffer, P Pan… - ACM Transactions on …, 2022 - dl.acm.org
The year 2011 marked an important transition for FPGA high-level synthesis (HLS), as it
went from prototyping to deployment. A decade later, in this article, we assess the progress …

Programming and synthesis for software-defined FPGA acceleration: status and future prospects

YH Lai, E Ustun, S Xiang, Z Fang, H Rong… - ACM Transactions on …, 2021 - dl.acm.org
FPGA-based accelerators are increasingly popular across a broad range of applications,
because they offer massive parallelism, high energy efficiency, and great flexibility for …

Gemmini: Enabling systematic deep-learning architecture evaluation via full-stack integration

H Genc, S Kim, A Amid, A Haj-Ali, V Iyer… - 2021 58th ACM/IEEE …, 2021 - ieeexplore.ieee.org
DNN accelerators are often developed and evaluated in isolation without considering the
cross-stack, system-level effects in real-world environments. This makes it difficult to …

Scalehls: A new scalable high-level synthesis framework on multi-level intermediate representation

H Ye, C Hao, J Cheng, H Jeong… - … symposium on high …, 2022 - ieeexplore.ieee.org
High-level synthesis (HLS) has been widely adopted as it significantly improves the
hardware design productivity and enables efficient design space exploration (DSE). Existing …

Tensorir: An abstraction for automatic tensorized program optimization

S Feng, B Hou, H Jin, W Lin, J Shao, R Lai… - Proceedings of the 28th …, 2023 - dl.acm.org
Deploying deep learning models on various devices has become an important topic. The
wave of hardware specialization brings a diverse set of acceleration primitives for multi …

HeteroCL: A multi-paradigm programming infrastructure for software-defined reconfigurable computing

YH Lai, Y Chi, Y Hu, J Wang, CH Yu, Y Zhou… - Proceedings of the …, 2019 - dl.acm.org
With the pursuit of improving compute performance under strict power constraints, there is
an increasing need for deploying applications to heterogeneous hardware architectures with …

[PDF][PDF] Gemmini: An agile systolic array generator enabling systematic evaluations of deep-learning architectures

H Genc, A Haj-Ali, V Iyer, A Amid, H Mao… - arXiv preprint arXiv …, 2019 - alonamid.github.io
Advances in deep learning and neural networks have resulted in rapid development of
hardware accelerators that support them. A large majority of ASIC accelerators, however …

Optimizing deep neural networks on intelligent edge accelerators via flexible-rate filter pruning

G Li, X Ma, X Wang, H Yue, J Li, L Liu, X Feng… - Journal of Systems …, 2022 - Elsevier
While deep learning has shown superior performance in various intelligent tasks, it is still a
challenging problem to deploy sophisticated models on resource-limited edge devices. Filter …

Remote power attacks on the versatile tensor accelerator in multi-tenant FPGAs

S Tian, S Moini, A Wolnikowski… - 2021 IEEE 29th …, 2021 - ieeexplore.ieee.org
Architectural details of machine learning models are crucial pieces of intellectual property in
many applications. Revealing the structure or types of layers in a model can result in a leak …

DNNVM: End-to-end compiler leveraging heterogeneous optimizations on FPGA-based CNN accelerators

Y Xing, S Liang, L Sui, X Jia, J Qiu, X Liu… - … on Computer-Aided …, 2019 - ieeexplore.ieee.org
The convolutional neural network (CNN) has become a state-of-the-art method for several
artificial intelligence domains in recent years. The increasingly complex CNN models are …