An Open-Source ML-Based Full-Stack Optimization Framework for Machine Learning Accelerators

H Esmaeilzadeh, S Ghodrati, A Kahng, JK Kim… - ACM Transactions on …, 2024 - dl.acm.org
Parameterizable machine learning (ML) accelerators are the product of recent
breakthroughs in ML. To fully enable their design space exploration (DSE), we propose a …

Physically accurate learning-based performance prediction of hardware-accelerated ml algorithms

H Esmaeilzadeh, S Ghodrati, AB Kahng… - Proceedings of the …, 2022 - dl.acm.org
Parameterizable ML accelerators are the product of recent breakthroughs in machine
learning (ML). To fully enable the design space exploration, we propose a physical-design …

ARCO: Adaptive Multi-Agent Reinforcement Learning-Based Hardware/Software Co-Optimization Compiler for Improved Performance in DNN Accelerator Design

A Fayyazi, M Kamal, M Pedram - arXiv preprint arXiv:2407.08192, 2024 - arxiv.org
This paper presents ARCO, an adaptive Multi-Agent Reinforcement Learning (MARL)-based
co-optimizing compilation framework designed to enhance the efficiency of mapping …

Reusing GEMM hardware for efficient execution of depthwise separable convolution on ASIC-based DNN accelerators

SD Manasi, S Banerjee, A Davare, AA Sorokin… - Proceedings of the 28th …, 2023 - dl.acm.org
Deep learning (DL) accelerators are optimized for standard convolution. However,
lightweight convolutional neural networks (CNNs) use depthwise convolution (DwC) in key …

Performance Analysis of DNN Inference/Training with Convolution and non-Convolution Operations

H Esmaeilzadeh, S Ghodrati, AB Kahng… - arXiv preprint arXiv …, 2023 - arxiv.org
Today's performance analysis frameworks for deep learning accelerators suffer from two
significant limitations. First, although modern convolutional neural network (CNNs) consist of …

Performance analysis of CNN inference/training with convolution and non-convolution operations on ASIC accelerators

H Esmaeilzadeh, S Ghodrati, AB Kahng… - ACM Transactions on …, 2024 - dl.acm.org
Today's performance analysis frameworks for deep learning accelerators suffer from two
significant limitations. First, although modern convolutional neural networks (CNNs) consist …

DNN Model Theft Through Trojan Side-Channel on Edge FPGA Accelerator

S Chandrasekar, SK Lam, S Thambipillai - International Symposium on …, 2023 - Springer
In this paper, we present a novel hardware trojan assisted side-channel attack to reverse
engineer DNN architectures on edge FPGA accelerators. In particular, our attack targets the …

Optimization of the Versatile Tensor Accelerator (VTA) Load Module in a Time-Triggered Memory Access

AM Ezekiel, D Onwuchekwa… - 2023 26th Euromicro …, 2023 - ieeexplore.ieee.org
Embedded systems powered by artificial intelligence (AI) are widely employed in diverse
domains. However, the lack of inherent predictability in existing AI accelerators poses …

Exploration for Efficient Depthwise Separable Convolution Networks Deployment on FPGA

Z Huang, A Qie, C Zhang, J Yang… - 2024 IEEE 6th …, 2024 - ieeexplore.ieee.org
Depthwise Separable Convolution (DSC) has become the key structure in lightweight
convolutional neural networks. However, the tight connection between network structure and …

Software-driven Design for Domain-specific Compute

DA Kirkpatrick - Proceedings of the 2023 International Symposium on …, 2023 - dl.acm.org
The end of Dennard scaling has created a focus on advancing domain-specific computing;
we are seeing a renaissance of accelerating compute problems through specialization, with …