相关文章- 学术资源搜索

Communication-Efficient Distributed Stochastic Gradient Descent with Pooling Operator

Z Cai, A Chen, Y Luo, J Li - Available at SSRN 4327869, 2023 - papers.ssrn.com

Training deep neural networks on large datasets can be accelerated by distributing the
computations across multiple worker nodes. The distributed stochastic gradient descent (D …

被引用次数：1 相关文章所有 2 个版本

[PDF] neurips.cc

A linear speedup analysis of distributed deep learning with sparse and quantized communication

P Jiang, G Agrawal - Advances in Neural Information …, 2018 - proceedings.neurips.cc

The large communication overhead has imposed a bottleneck on the performance of
distributed Stochastic Gradient Descent (SGD) for training deep neural networks. Previous …

被引用次数：226 相关文章所有 6 个版本

[PDF] arxiv.org

Layer-Wise Adaptive Gradient Sparsification for Distributed Deep Learning with Convergence Guarantees.

S Shi, Z Tang, Q Wang, K Zhao, X Chu - ECAI, 2020 - ebooks.iospress.nl

To reduce the long training time of large deep neural network (DNN) models, distributed
synchronous stochastic gradient descent (S-SGD) is commonly used on a cluster of workers …

被引用次数：25 相关文章所有 6 个版本

[PDF] arxiv.org

O (1) Communication for Distributed SGD through Two-Level Gradient Averaging

S Bhattacharya, W Yu, FT Chowdhury… - … Conference on Cluster …, 2021 - ieeexplore.ieee.org

Large neural network models present a hefty communication challenge to distributed
Stochastic Gradient Descent (SGD), with a per-iteration communication complexity of O(n) …

被引用次数：1 相关文章所有 6 个版本

[PDF] wm.edu

CE-SGD: Communication-efficient distributed machine learning

Z Tao, Q Xia, Q Li, S Cheng - 2021 IEEE Global …, 2021 - ieeexplore.ieee.org

Training large-scale machine learning models usually demands a distributed approach to
process the huge amount of training data efficiently. However, the high network …

被引用次数：2 相关文章所有 3 个版本

[PDF] ieee.org

Distributed Stochastic Gradient Descent With Compressed and Skipped Communication

TT Phuong, K Fukushima - IEEE Access, 2023 - ieeexplore.ieee.org

This paper introduces CompSkipDSGD, a new algorithm for distributed stochastic gradient
descent that aims to improve communication efficiency by compressing and selectively …

Communication-compressed adaptive gradient method for distributed nonconvex optimization

Y Wang, L Lin, J Chen - International Conference on Artificial …, 2022 - proceedings.mlr.press

Due to the explosion in the size of the training datasets, distributed learning has received
growing interest in recent years. One of the major bottlenecks is the large communication …

被引用次数：14 相关文章所有 7 个版本

[PDF] arxiv.org

Sparse communication for training deep networks

NF Eghlidi, M Jaggi - arXiv preprint arXiv:2009.09271, 2020 - arxiv.org

Synchronous stochastic gradient descent (SGD) is the most common method used for
distributed training of deep learning models. In this algorithm, each worker shares its local …

被引用次数：16 相关文章

[PDF] mlr.press

Detached error feedback for distributed SGD with random sparsification

A Xu, H Huang - International Conference on Machine …, 2022 - proceedings.mlr.press

The communication bottleneck has been a critical problem in large-scale distributed deep
learning. In this work, we study distributed SGD with random block-wise sparsification as the …

被引用次数：12 相关文章所有 6 个版本

[PDF] arxiv.org

Gradient noise convolution (gnc): Smoothing loss function for distributed large-batch sgd

K Haruki, T Suzuki, Y Hamakawa, T Toda… - arXiv preprint arXiv …, 2019 - arxiv.org

Large-batch stochastic gradient descent (SGD) is widely used for training in distributed deep
learning because of its training-time efficiency, however, extremely large-batch SGD leads to …

被引用次数：18 相关文章所有 2 个版本

高级搜索

QQ 群

Communication-Efficient Distributed Stochastic Gradient Descent with Pooling Operator

A linear speedup analysis of distributed deep learning with sparse and quantized communication

Layer-Wise Adaptive Gradient Sparsification for Distributed Deep Learning with Convergence Guarantees.

O (1) Communication for Distributed SGD through Two-Level Gradient Averaging

CE-SGD: Communication-efficient distributed machine learning

Distributed Stochastic Gradient Descent With Compressed and Skipped Communication

Communication-compressed adaptive gradient method for distributed nonconvex optimization

Sparse communication for training deep networks

Detached error feedback for distributed SGD with random sparsification

Gradient noise convolution (gnc): Smoothing loss function for distributed large-batch sgd

相关搜索

引用