Plasticine: a reconfigurable accelerator for parallel patterns

J Shalf - Philosophical Transactions of the Royal Society …, 2020 - royalsocietypublishing.org

Moore's Law is a techno-economic model that has enabled the information technology
industry to double the performance and functionality of digital electronics roughly every 2 …

被引用次数：569 相关文章所有 14 个版本

[PDF] acm.org

A full-stack search technique for domain optimized deep learning accelerators

D Zhang, S Huda, E Songhori, K Prabhu, Q Le… - Proceedings of the 27th …, 2022 - dl.acm.org

The rapidly-changing deep learning landscape presents a unique opportunity for building
inference accelerators optimized for specific datacenter-scale workloads. We propose Full …

被引用次数：55 相关文章所有 3 个版本

[PDF] acm.org

Towards general purpose acceleration by exploiting common data-dependence forms

V Dadu, J Weng, S Liu, T Nowatzki - … of the 52nd Annual IEEE/ACM …, 2019 - dl.acm.org

With slowing technology scaling, specialized accelerators are increasingly attractive
solutions to continue expected generational scaling of performance. However, in order to …

被引用次数：94 相关文章所有 7 个版本

[PDF] acm.org

Tileflow: A framework for modeling fusion dataflow via tree-based analysis

S Zheng, S Chen, S Gao, L Jia, G Sun… - Proceedings of the 56th …, 2023 - dl.acm.org

With the increasing size of DNN models and the growing discrepancy between compute
performance and memory bandwidth, fusing multiple layers together to reduce off-chip …

被引用次数：11 相关文章所有 5 个版本

[PDF] arxiv.org

Reinforcement learning approach for mapping applications to dataflow-based coarse-grained reconfigurable array

AXM Chang, P Khopkar, B Romanous… - arXiv preprint arXiv …, 2022 - arxiv.org

The Streaming Engine (SE) is a Coarse-Grained Reconfigurable Array which provides
programming flexibility and high-performance with energy efficiency. An application program …

被引用次数：7 相关文章所有 3 个版本

[PDF] arxiv.org

Evaluating emerging ai/ml accelerators: Ipu, rdu, and nvidia/amd gpus

H Peng, C Ding, T Geng, S Choudhury… - Companion of the 15th …, 2024 - dl.acm.org

The relentless advancement of artificial intelligence (AI) and machine learning (ML)
applications necessitates the development of specialized hardware accelerators capable of …

被引用次数：2 相关文章所有 3 个版本

DAP: A 507-GMACs/J 256-Core Domain Adaptive Processor for Wireless Communication and Linear Algebra Kernels in 12-nm FINFET

KY Chen, CS Yang, YH Sun, CW Tseng… - IEEE Journal of Solid …, 2024 - ieeexplore.ieee.org

We present domain adaptive processor (), a programmable systolic-array processor
designed for wireless communication and linear algebra workloads. uses a globally …

[PDF] ieee.org

FCNNLib: A flexible convolution algorithm library for deep learning on FPGAs

Y Liang, Q Xiao, L Lu, J Xie - IEEE Transactions on Computer …, 2021 - ieeexplore.ieee.org

Convolution features huge complexity and demands high computation capability. Among
hardware platforms, field programmable gate array (FPGA) emerges as a promising solution …

被引用次数：9 相关文章所有 2 个版本

[PDF] techrxiv.org

Highly parameterised CGRA architecture for design space exploration of machine learning applications onboard satellites

L Zulberti, M Monopoli, P Nannipieri… - 2023 European Data …, 2023 - ieeexplore.ieee.org

The adoption of Machine Learning solutions directly onboard in satellite missions is
becoming more and more attractive for the space sector. Among the various kinds of …

被引用次数：2 相关文章所有 5 个版本

[PDF] acm.org

Canalis: A Throughput-Optimized Framework for Real-Time Stream Processing of Wireless Communication

KY Chen, TM Nelson, A Khadem, M Fayazi… - ACM Transactions on …, 2024 - dl.acm.org

Stream processing, which involves real-time computation of data as it is created or received,
is vital for various applications, specifically wireless communication. The evolving protocols …

高级搜索

QQ 群