Bladedisc: Optimizing dynamic shape machine learning workloads via compiler approach

Z Zheng, Z Pan, D Wang, K Zhu, W Zhao… - Proceedings of the …, 2023 - dl.acm.org
Compiler optimization plays an increasingly important role to boost the performance of
machine learning models for data processing and management. With increasingly complex …

[HTML][HTML] Adaptive partitioning and efficient scheduling for distributed DNN training in heterogeneous IoT environment

B Huang, X Huang, X Liu, C Ding, Y Yin… - Computer …, 2024 - Elsevier
With the increasing proliferation of Internet-of-Things (IoT) devices, it is a growing trend
toward training a deep neural network (DNN) model in pipeline parallelism across resource …

Recom: A compiler approach to accelerating recommendation model inference with massive embedding columns

Z Pan, Z Zheng, F Zhang, R Wu, H Liang… - Proceedings of the 28th …, 2023 - dl.acm.org
Embedding columns are important for deep recommendation models to achieve high
accuracy, but they can be very time-consuming during inference. Machine learning (ML) …

TiMePReSt: Time and Memory Efficient Pipeline Parallel DNN Training with Removed Staleness

A Dutta, N Chaki, RK De - arXiv preprint arXiv:2410.14312, 2024 - arxiv.org
DNN training is extremely time-consuming, necessitating efficient multi-accelerator
parallelization, where a single iteration of training is split over the accelerators. Current …

A Comparative Study of Neural Network Compilers on ARMv8 Architecture

T Anthimopulos, G Keramidas, V Kelefouras… - … on Architecture of …, 2023 - Springer
Abstract The deployment of Deep Neural Network (DNN) models in far edge devices is a
challenging task, because these devices are characterized by scarce resources. To address …