Grace: A compressed communication framework for distributed machine learning

H Xu, CY Ho, AM Abdelmoniem, A Dutta… - 2021 IEEE 41st …, 2021 - ieeexplore.ieee.org
Powerful computer clusters are used nowadays to train complex deep neural networks
(DNN) on large datasets. Distributed training increasingly becomes communication bound …

Compressed communication for distributed deep learning: Survey and quantitative evaluation

H Xu, CY Ho, AM Abdelmoniem, A Dutta, EH Bergou… - 2020 - repository.kaust.edu.sa
Powerful computer clusters are used nowadays to train complex deep neural networks
(DNN) on large datasets. Distributed training workloads increasingly become …

Sparse binary compression: Towards distributed deep learning with minimal communication

F Sattler, S Wiedemann, KR Müller… - 2019 International Joint …, 2019 - ieeexplore.ieee.org
Currently, progressively larger deep neural networks are trained on ever growing data
corpora. In result, distributed training schemes are becoming increasingly relevant. A major …

Scalecom: Scalable sparsified gradient compression for communication-efficient distributed training

CY Chen, J Ni, S Lu, X Cui, PY Chen… - Advances in …, 2020 - proceedings.neurips.cc
Large-scale distributed training of Deep Neural Networks (DNNs) on state-of-the-art
platforms are expected to be severely communication constrained. To overcome this …

Variance-based gradient compression for efficient distributed deep learning

Y Tsuzuku, H Imachi, T Akiba - arXiv preprint arXiv:1802.06058, 2018 - arxiv.org
Due to the substantial computational cost, training state-of-the-art deep neural networks for
large-scale datasets often requires distributed training using multiple computation workers …

DC2: Delay-aware compression control for distributed machine learning

AM Abdelmoniem, M Canini - IEEE INFOCOM 2021-IEEE …, 2021 - ieeexplore.ieee.org
Distributed training performs data-parallel training of DNN models which is a necessity for
increasingly complex models and large datasets. Recent works are identifying major …

Ac-sgd: Adaptively compressed sgd for communication-efficient distributed learning

G Yan, T Li, SL Huang, T Lan… - IEEE Journal on Selected …, 2022 - ieeexplore.ieee.org
Gradient compression (eg, gradient quantization and gradient sparsification) is a core
technique in reducing communication costs in distributed learning systems. The recent trend …

Practical low-rank communication compression in decentralized deep learning

T Vogels, SP Karimireddy… - Advances in Neural …, 2020 - proceedings.neurips.cc
Lossy gradient compression has become a practical tool to overcome the communication
bottleneck in centrally coordinated distributed training of machine learning models …

Communication-efficient distributed deep learning: A comprehensive survey

Z Tang, S Shi, W Wang, B Li, X Chu - arXiv preprint arXiv:2003.06307, 2020 - arxiv.org
Distributed deep learning (DL) has become prevalent in recent years to reduce training time
by leveraging multiple computing devices (eg, GPUs/TPUs) due to larger models and …

Learned gradient compression for distributed deep learning

L Abrahamyan, Y Chen, G Bekoulis… - IEEE Transactions on …, 2021 - ieeexplore.ieee.org
Training deep neural networks on large datasets containing high-dimensional data requires
a large amount of computation. A solution to this problem is data-parallel distributed training …