相关文章- 学术资源搜索

A linear speedup analysis of distributed deep learning with sparse and quantized communication

P Jiang, G Agrawal - Advances in Neural Information …, 2018 - proceedings.neurips.cc

The large communication overhead has imposed a bottleneck on the performance of
distributed Stochastic Gradient Descent (SGD) for training deep neural networks. Previous …

被引用次数：219 相关文章所有 8 个版本

[PDF] arxiv.org

Understanding top-k sparsification in distributed deep learning

S Shi, X Chu, KC Cheung, S See - arXiv preprint arXiv:1911.08772, 2019 - arxiv.org

Distributed stochastic gradient descent (SGD) algorithms are widely deployed in training
large-scale deep learning models, while the communication overhead among workers …

被引用次数：84 相关文章所有 4 个版本

[PDF] mlr.press

Trading redundancy for communication: Speeding up distributed SGD for non-convex optimization

F Haddadpour, MM Kamani… - International …, 2019 - proceedings.mlr.press

Communication overhead is one of the key challenges that hinders the scalability of
distributed optimization algorithms to train large neural networks. In recent years, there has …

被引用次数：79 相关文章所有 8 个版本

[PDF] arxiv.org

Communication-efficient distributed deep learning: A comprehensive survey

Z Tang, S Shi, W Wang, B Li, X Chu - arXiv preprint arXiv:2003.06307, 2020 - arxiv.org

Distributed deep learning (DL) has become prevalent in recent years to reduce training time
by leveraging multiple computing devices (eg, GPUs/TPUs) due to larger models and …

被引用次数：120 相关文章所有 4 个版本

[PDF] hkbu.edu.hk

Communication-efficient distributed deep learning with merged gradient sparsification on GPUs

S Shi, Q Wang, X Chu, B Li, Y Qin… - IEEE INFOCOM 2020 …, 2020 - ieeexplore.ieee.org

Distributed synchronous stochastic gradient descent (SGD) algorithms are widely used in
large-scale deep learning applications, while it is known that the communication bottleneck …

被引用次数：64 相关文章所有 6 个版本

[PDF] neurips.cc

Double quantization for communication-efficient distributed optimization

Y Yu, J Wu, L Huang - Advances in Neural Information …, 2019 - proceedings.neurips.cc

Modern distributed training of machine learning models often suffers from high
communication overhead for synchronizing stochastic gradients and model parameters. In …

被引用次数：59 相关文章所有 8 个版本

[PDF] mlr.press

Error compensated quantized SGD and its applications to large-scale distributed optimization

J Wu, W Huang, J Huang… - … Conference on Machine …, 2018 - proceedings.mlr.press

Large-scale distributed optimization is of great importance in various applications. For data-
parallel based distributed learning, the inter-node gradient communication often becomes …

被引用次数：251 相关文章所有 11 个版本

[PDF] neurips.cc

Qsparse-local-SGD: Distributed SGD with quantization, sparsification and local computations

D Basu, D Data, C Karakus… - Advances in Neural …, 2019 - proceedings.neurips.cc

Communication bottleneck has been identified as a significant issue in distributed
optimization of large-scale learning models. Recently, several approaches to mitigate this …

被引用次数：329 相关文章所有 11 个版本

[PDF] kaust.edu.sa

Compressed communication for distributed deep learning: Survey and quantitative evaluation

H Xu, CY Ho, AM Abdelmoniem, A Dutta, EH Bergou… - 2020 - repository.kaust.edu.sa

Powerful computer clusters are used nowadays to train complex deep neural networks
(DNN) on large datasets. Distributed training workloads increasingly become …

被引用次数：76 相关文章所有 3 个版本

[PDF] arxiv.org

Near-optimal sparse allreduce for distributed deep learning

S Li, T Hoefler - Proceedings of the 27th ACM SIGPLAN Symposium on …, 2022 - dl.acm.org

Communication overhead is one of the major obstacles to train large deep learning models
at scale. Gradient sparsification is a promising technique to reduce the communication …

被引用次数：35 相关文章所有 30 个版本

高级搜索

QQ 群

A linear speedup analysis of distributed deep learning with sparse and quantized communication

Understanding top-k sparsification in distributed deep learning

Trading redundancy for communication: Speeding up distributed SGD for non-convex optimization

Communication-efficient distributed deep learning: A comprehensive survey

Communication-efficient distributed deep learning with merged gradient sparsification on GPUs

Double quantization for communication-efficient distributed optimization

Error compensated quantized SGD and its applications to large-scale distributed optimization

Qsparse-local-SGD: Distributed SGD with quantization, sparsification and local computations

Compressed communication for distributed deep learning: Survey and quantitative evaluation

Near-optimal sparse allreduce for distributed deep learning

相关搜索

引用