Mercury: A simple transport layer scheduler to accelerate distributed DNN training

Q Duan, C Peng, Z Wang, Y Xu, S Liu… - … on Parallel and …, 2023 - ieeexplore.ieee.org

Communication scheduling is crucial to accelerate the training of large deep learning
models, in which the transmission order of layer-wise deep neural network (DNN) tensors is …

被引用次数：3 相关文章所有 3 个版本

Prophet: Fine-grained Load Balancing for Parallel Training of Large-scale MoE Models

W Wang, Z Lai, S Li, W Liu, K Ge, Y Liu… - 2023 IEEE …, 2023 - ieeexplore.ieee.org

Mixture of Expert (MoE) has received increasing attention for scaling DNN models to extra-
large size with negligible increases in computation. The MoE model has achieved the …

被引用次数：5 相关文章所有 2 个版本

[PDF] surrey.ac.uk

US-Byte: An Efficient Communication Framework for Scheduling Unequal-Sized Tensor Blocks in Distributed Deep Learning

Y Gao, B Hu, MB Mashhadi, AL Jin… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org

The communication bottleneck severely constrains the scalability of distributed deep
learning, and efficient communication scheduling accelerates distributed DNN training by …

被引用次数：1 相关文章所有 5 个版本

An efficient bandwidth-adaptive gradient compression algorithm for distributed training of deep neural networks

Z Wang, Q Duan, Y Xu, L Zhang - Journal of Systems Architecture, 2024 - Elsevier

In distributed deep learning with data parallelism, communication bottleneck throttles the
efficiency of model training. Recent studies adopt versatile gradient compression …

被引用次数：1 相关文章

OF-WFBP: A near-optimal communication mechanism for tensor fusion in distributed deep learning

Y Gao, Z Zhang, B Hu, AL Jin, C Wu - Parallel Computing, 2023 - Elsevier

The communication bottleneck has severely restricted the scalability of distributed deep
learning. Tensor fusion improves the scalability of data parallelism by overlapping …

Host-driven In-Network Aggregation on RDMA

Y Li, W Li, Y Yao, Y Du, K Li - IEEE INFOCOM 2024-IEEE …, 2024 - ieeexplore.ieee.org

Large-scale datacenter networks are increasingly using in-network aggregation (INA) and
remote direct memory access (RDMA) techniques to accelerate deep neural network (DNN) …

AOCC-FL: Federated Learning with Aligned Overlapping via Calibrated Compensation

H Wang, W Xu, Y Fan, R Li… - IEEE INFOCOM 2023-IEEE …, 2023 - ieeexplore.ieee.org

Federated Learning enables collaboratively model training among a number of distributed
devices with the coordination of a centralized server, where each device alternatively …

被引用次数：2 相关文章

[PDF] sjtu.edu.cn

高级搜索

QQ 群