The future of computing beyond Moore's Law

J Shalf - Philosophical Transactions of the Royal Society …, 2020 - royalsocietypublishing.org
Moore's Law is a techno-economic model that has enabled the information technology
industry to double the performance and functionality of digital electronics roughly every 2 …

A full-stack search technique for domain optimized deep learning accelerators

D Zhang, S Huda, E Songhori, K Prabhu, Q Le… - Proceedings of the 27th …, 2022 - dl.acm.org
The rapidly-changing deep learning landscape presents a unique opportunity for building
inference accelerators optimized for specific datacenter-scale workloads. We propose Full …

Towards general purpose acceleration by exploiting common data-dependence forms

V Dadu, J Weng, S Liu, T Nowatzki - … of the 52nd Annual IEEE/ACM …, 2019 - dl.acm.org
With slowing technology scaling, specialized accelerators are increasingly attractive
solutions to continue expected generational scaling of performance. However, in order to …

Tileflow: A framework for modeling fusion dataflow via tree-based analysis

S Zheng, S Chen, S Gao, L Jia, G Sun… - Proceedings of the 56th …, 2023 - dl.acm.org
With the increasing size of DNN models and the growing discrepancy between compute
performance and memory bandwidth, fusing multiple layers together to reduce off-chip …

Reinforcement learning approach for mapping applications to dataflow-based coarse-grained reconfigurable array

AXM Chang, P Khopkar, B Romanous… - arXiv preprint arXiv …, 2022 - arxiv.org
The Streaming Engine (SE) is a Coarse-Grained Reconfigurable Array which provides
programming flexibility and high-performance with energy efficiency. An application program …

Evaluating emerging ai/ml accelerators: Ipu, rdu, and nvidia/amd gpus

H Peng, C Ding, T Geng, S Choudhury… - Companion of the 15th …, 2024 - dl.acm.org
The relentless advancement of artificial intelligence (AI) and machine learning (ML)
applications necessitates the development of specialized hardware accelerators capable of …

DAP: A 507-GMACs/J 256-Core Domain Adaptive Processor for Wireless Communication and Linear Algebra Kernels in 12-nm FINFET

KY Chen, CS Yang, YH Sun, CW Tseng… - IEEE Journal of Solid …, 2024 - ieeexplore.ieee.org
We present domain adaptive processor (), a programmable systolic-array processor
designed for wireless communication and linear algebra workloads. uses a globally …

FCNNLib: A flexible convolution algorithm library for deep learning on FPGAs

Y Liang, Q Xiao, L Lu, J Xie - IEEE Transactions on Computer …, 2021 - ieeexplore.ieee.org
Convolution features huge complexity and demands high computation capability. Among
hardware platforms, field programmable gate array (FPGA) emerges as a promising solution …

Highly parameterised CGRA architecture for design space exploration of machine learning applications onboard satellites

L Zulberti, M Monopoli, P Nannipieri… - 2023 European Data …, 2023 - ieeexplore.ieee.org
The adoption of Machine Learning solutions directly onboard in satellite missions is
becoming more and more attractive for the space sector. Among the various kinds of …

Canalis: A Throughput-Optimized Framework for Real-Time Stream Processing of Wireless Communication

KY Chen, TM Nelson, A Khadem, M Fayazi… - ACM Transactions on …, 2024 - dl.acm.org
Stream processing, which involves real-time computation of data as it is created or received,
is vital for various applications, specifically wireless communication. The evolving protocols …