Communication-efficient distributed deep learning: A comprehensive survey

Z Tang, S Shi, W Wang, B Li, X Chu - arXiv preprint arXiv:2003.06307, 2020 - arxiv.org
Distributed deep learning (DL) has become prevalent in recent years to reduce training time
by leveraging multiple computing devices (eg, GPUs/TPUs) due to larger models and …

Communication-efficient distributed learning: An overview

X Cao, T Başar, S Diggavi, YC Eldar… - IEEE journal on …, 2023 - ieeexplore.ieee.org
Distributed learning is envisioned as the bedrock of next-generation intelligent networks,
where intelligent agents, such as mobile devices, robots, and sensors, exchange information …

1-bit adam: Communication efficient large-scale training with adam's convergence speed

H Tang, S Gan, AA Awan… - International …, 2021 - proceedings.mlr.press
Scalable training of large models (like BERT and GPT-3) requires careful optimization
rooted in model design, architecture, and system capabilities. From a system standpoint …

Dynamic aggregation for heterogeneous quantization in federated learning

S Chen, C Shen, L Zhang… - IEEE Transactions on …, 2021 - ieeexplore.ieee.org
Communication is widely known as the primary bottleneck of federated learning, and
quantization of local model updates before uploading to the parameter server is an effective …

ProgFed: effective, communication, and computation efficient federated learning by progressive training

HP Wang, S Stich, Y He, M Fritz - … Conference on Machine …, 2022 - proceedings.mlr.press
Federated learning is a powerful distributed learning scheme that allows numerous edge
devices to collaboratively train a model without sharing their data. However, training is …

Federated learning over noisy channels: Convergence analysis and design examples

X Wei, C Shen - IEEE Transactions on Cognitive …, 2022 - ieeexplore.ieee.org
Does Federated Learning (FL) work when both uplink and downlink communications have
errors? How much communication noise can FL handle and what is its impact on the …

Edge learning: The enabling technology for distributed big data analytics in the edge

J Zhang, Z Qu, C Chen, H Wang, Y Zhan, B Ye… - ACM Computing …, 2021 - dl.acm.org
Machine Learning (ML) has demonstrated great promise in various fields, eg, self-driving,
smart city, which are fundamentally altering the way individuals and organizations live, work …

[HTML][HTML] Enhancing the robustness of object detection via 6G vehicular edge computing

C Chen, G Yao, C Wang, S Goudos, S Wan - Digital Communications and …, 2022 - Elsevier
Academic and industrial communities have been paying significant attention to the 6th
Generation (6G) wireless communication systems after the commercial deployment of 5G …

{TopoOpt}: Co-optimizing Network Topology and Parallelization Strategy for Distributed Training Jobs

W Wang, M Khazraee, Z Zhong, M Ghobadi… - … USENIX Symposium on …, 2023 - usenix.org
We propose TopoOpt, a novel direct-connect fabric for deep neural network (DNN) training
workloads. TopoOpt co-optimizes the distributed training process across three dimensions …

Adaptive message quantization and parallelization for distributed full-graph gnn training

B Wan, J Zhao, C Wu - Proceedings of Machine Learning …, 2023 - proceedings.mlsys.org
Distributed full-graph training of Graph Neural Networks (GNNs) over large graphs is
bandwidth-demanding and time-consuming. Frequent exchanges of node features …