Efficient sparse collective communication and its application to accelerate distributed deep...

Z Tang, S Shi, W Wang, B Li, X Chu - arXiv preprint arXiv:2003.06307, 2020 - arxiv.org

Distributed deep learning (DL) has become prevalent in recent years to reduce training time
by leveraging multiple computing devices (eg, GPUs/TPUs) due to larger models and …

被引用次数：134 相关文章所有 4 个版本

[PDF] researchgate.net

Offloading machine learning to programmable data planes: A systematic survey

R Parizotto, BL Coelho, DC Nunes, I Haque… - ACM Computing …, 2023 - dl.acm.org

The demand for machine learning (ML) has increased significantly in recent decades,
enabling several applications, such as speech recognition, computer vision, and …

被引用次数：14 相关文章所有 3 个版本

[PDF] usenix.org

Scaling distributed machine learning with {In-Network} aggregation

A Sapio, M Canini, CY Ho, J Nelson, P Kalnis… - … USENIX Symposium on …, 2021 - usenix.org

Training machine learning models in parallel is an increasingly important workload. We
accelerate distributed parallel training by designing a communication primitive that uses a …

被引用次数：440 相关文章所有 19 个版本

[PDF] kaust.edu.sa

Grace: A compressed communication framework for distributed machine learning

H Xu, CY Ho, AM Abdelmoniem, A Dutta… - 2021 IEEE 41st …, 2021 - ieeexplore.ieee.org

Powerful computer clusters are used nowadays to train complex deep neural networks
(DNN) on large datasets. Distributed training increasingly becomes communication bound …

被引用次数：89 相关文章所有 9 个版本

[PDF] mlr.press

Natural compression for distributed deep learning

S Horvóth, CY Ho, L Horvath, AN Sahu… - Mathematical and …, 2022 - proceedings.mlr.press

Modern deep learning models are often trained in parallel over a collection of distributed
machines to reduce training time. In such settings, communication of model updates among …

被引用次数：163 相关文章所有 16 个版本

[PDF] arxiv.org

Near-optimal sparse allreduce for distributed deep learning

S Li, T Hoefler - Proceedings of the 27th ACM SIGPLAN Symposium on …, 2022 - dl.acm.org

Communication overhead is one of the major obstacles to train large deep learning models
at scale. Gradient sparsification is a promising technique to reduce the communication …

被引用次数：41 相关文章所有 30 个版本

[PDF] github.io

From luna to solar: the evolutions of the compute-to-storage networks in alibaba cloud

R Miao, L Zhu, S Ma, K Qian, S Zhuang, B Li… - Proceedings of the …, 2022 - dl.acm.org

This paper presents the two generations of storage network stacks that reduced the average
I/O latency of Alibaba Cloud's EBS service by 72% in the last five years: Luna, a user-space …

被引用次数：36 相关文章所有 2 个版本

Advancements in accelerating deep neural network inference on aiot devices: A survey

L Cheng, Y Gu, Q Liu, L Yang, C Liu… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org

The amalgamation of artificial intelligence with Internet of Things (AIoT) devices have seen a
rapid surge in growth, largely due to the effective implementation of deep neural network …

被引用次数：13 相关文章所有 4 个版本

[PDF] arxiv.org

Flare: Flexible in-network allreduce

D De Sensi, S Di Girolamo, S Ashkboos, S Li… - Proceedings of the …, 2021 - dl.acm.org

The allreduce operation is one of the most commonly used communication routines in
distributed applications. To improve its bandwidth and to reduce network traffic, this …

被引用次数：43 相关文章所有 26 个版本

[PDF] google.com

GRID: Gradient routing with in-network aggregation for distributed training

J Fang, G Zhao, H Xu, C Wu… - IEEE/ACM Transactions on …, 2023 - ieeexplore.ieee.org

As the scale of distributed training increases, it brings huge communication overhead in
clusters. Some works try to reduce the communication cost through gradient compression or …

被引用次数：19 相关文章所有 3 个版本

高级搜索

QQ 群