Optimizing DNN compilation for distributed training with joint OP and tensor fusion

Z Zheng, Z Pan, D Wang, K Zhu, W Zhao… - Proceedings of the …, 2023 - dl.acm.org

Compiler optimization plays an increasingly important role to boost the performance of
machine learning models for data processing and management. With increasingly complex …

被引用次数：12 相关文章所有 4 个版本

[HTML] sciencedirect.com

[HTML][HTML] Adaptive partitioning and efficient scheduling for distributed DNN training in heterogeneous IoT environment

B Huang, X Huang, X Liu, C Ding, Y Yin… - Computer …, 2024 - Elsevier

With the increasing proliferation of Internet-of-Things (IoT) devices, it is a growing trend
toward training a deep neural network (DNN) model in pipeline parallelism across resource …

被引用次数：2 相关文章所有 4 个版本 Web of Science: 1

[PDF] github.io

Recom: A compiler approach to accelerating recommendation model inference with massive embedding columns

Z Pan, Z Zheng, F Zhang, R Wu, H Liang… - Proceedings of the 28th …, 2023 - dl.acm.org

Embedding columns are important for deep recommendation models to achieve high
accuracy, but they can be very time-consuming during inference. Machine learning (ML) …

被引用次数：7 相关文章所有 4 个版本

[PDF] arxiv.org

TiMePReSt: Time and Memory Efficient Pipeline Parallel DNN Training with Removed Staleness

A Dutta, N Chaki, RK De - arXiv preprint arXiv:2410.14312, 2024 - arxiv.org

DNN training is extremely time-consuming, necessitating efficient multi-accelerator
parallelization, where a single iteration of training is split over the accelerators. Current …

A Comparative Study of Neural Network Compilers on ARMv8 Architecture

T Anthimopulos, G Keramidas, V Kelefouras… - … on Architecture of …, 2023 - Springer

Abstract The deployment of Deep Neural Network (DNN) models in far edge devices is a
challenging task, because these devices are characterized by scarce resources. To address …

高级搜索

QQ 群