Sparse-DySta: Sparsity-Aware Dynamic and Static Scheduling for Sparse Multi-DNN Workloads

H Fan, SI Venieris, A Kouris, N Lane - … of the 56th Annual IEEE/ACM …, 2023 - dl.acm.org
Running multiple deep neural networks (DNNs) in parallel has become an emerging
workload in both edge devices, such as mobile phones where multiple tasks serve a single …

DF-GAS: a Distributed FPGA-as-a-Service Architecture towards Billion-Scale Graph-based Approximate Nearest Neighbor Search

S Zeng, Z Zhu, J Liu, H Zhang, G Dai, Z Zhou… - Proceedings of the 56th …, 2023 - dl.acm.org
Embedding retrieval is a crucial task for recommendation systems. Graph-based
approximate nearest neighbor search (GANNS) is the most commonly used method for …

A High-Performance and Energy-Efficient Photonic Architecture for Multi-DNN Acceleration

Y Li, A Louri, A Karanth - IEEE Transactions on Parallel and …, 2023 - ieeexplore.ieee.org
Large-scale deep neural network (DNN) accelerators are poised to facilitate the concurrent
processing of diverse DNNs, imposing demanding challenges on the interconnection fabric …

Multi-Objective Hardware-Mapping Co-Optimisation for Multi-DNN Workloads on Chiplet-based Accelerators

A Das, E Russo, M Palesi - IEEE Transactions on Computers, 2024 - ieeexplore.ieee.org
The need to efficiently execute different Deep Neural Networks (DNNs) on the same
computing platform, coupled with the requirement for easy scalability, makes Multi-Chip …

Energy-Efficient, Delay-Constrained Edge Computing of a Network of DNNs

M Ghasemi, S Heidari, YG Kim, CJ Wu… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
This paper presents a novel approach for executing the inference of a network of pre-trained
deep neural networks (DNNs) on commercial-off-the-shelf devices that are deployed at the …

EXPRESS: A Framework for Execution Time Prediction of Concurrent CNNs on Xilinx DPU Accelerator

S Goel, R Kedia, R Sen, M Balakrishnan - ACM Transactions on …, 2024 - dl.acm.org
Deep learning Processor Unit (DPU) is a highly configurable CNN accelerator that supports
a variety of CNNs and can be implemented with multiple instances on the same FPGA. Many …

ElasticBatch: A Learning-Augmented Elastic Scheduling System for Batch Inference on MIG

J Qi, W Xiao, M Li, C Yang, Y Li, W Lin… - … on Parallel and …, 2024 - ieeexplore.ieee.org
As deep learning (DL) technologies become ubiquitous, GPU clusters are deployed for
inference tasks with consistent service level objectives (SLOs). Efficiently utilizing multiple …

M2M: A Fine-Grained Mapping Framework to Accelerate Multiple DNNs on a Multi-Chiplet Architecture

J Zhang, X Wang, Y Ye, D Lyu, G Xiong… - … Transactions on Very …, 2024 - ieeexplore.ieee.org
With the advancement of artificial intelligence, the collaboration of multiple deep neural
networks (DNNs) has been crucial to existing embedded systems and cloud systems …

Optimizing DNN Inference on Multi-Accelerator SoCs at Training-time

M Risso, A Burrello, DJ Pagliari - arXiv preprint arXiv:2409.18566, 2024 - arxiv.org
The demand for executing Deep Neural Networks (DNNs) with low latency and minimal
power consumption at the edge has led to the development of advanced heterogeneous …

Heterogeneous Accelerator Design for Multi-DNN Workloads via Heuristic Optimization

K Balaskas, H Khdr, MB Sikal, F Kreß… - IEEE Embedded …, 2024 - ieeexplore.ieee.org
The significant advancements of deep neural networks (DNNs) in a wide range of
application domains have spawned the need for more specialized, sophisticated solutions in …