相关文章- 学术资源搜索

Qsparse-local-SGD: Distributed SGD with quantization, sparsification and local computations

D Basu, D Data, C Karakus… - Advances in Neural …, 2019 - proceedings.neurips.cc

Communication bottleneck has been identified as a significant issue in distributed
optimization of large-scale learning models. Recently, several approaches to mitigate this …

被引用次数：328 相关文章所有 11 个版本

Qsparse-local-SGD: Distributed SGD with quantization, sparsification, and local computations

D Basu, D Data, C Karakus… - IEEE Journal on Selected …, 2020 - ieeexplore.ieee.org

Communication bottleneck has been identified as a significant issue in distributed
optimization of large-scale learning models. Recently, several approaches to mitigate this …

被引用次数：79 相关文章

[PDF] ijcai.org

[PDF][PDF] A convergence analysis of distributed SGD with communication-efficient gradient sparsification.

S Shi, K Zhao, Q Wang, Z Tang, X Chu - IJCAI, 2019 - ijcai.org

Gradient sparsification is a promising technique to significantly reduce the communication
overhead in decentralized synchronous stochastic gradient descent (S-SGD) algorithms …

被引用次数：81 相关文章所有 5 个版本

Ac-sgd: Adaptively compressed sgd for communication-efficient distributed learning

G Yan, T Li, SL Huang, T Lan… - IEEE Journal on Selected …, 2022 - ieeexplore.ieee.org

Gradient compression (eg, gradient quantization and gradient sparsification) is a core
technique in reducing communication costs in distributed learning systems. The recent trend …

被引用次数：22 相关文章所有 3 个版本

[PDF] neurips.cc

A linear speedup analysis of distributed deep learning with sparse and quantized communication

P Jiang, G Agrawal - Advances in Neural Information …, 2018 - proceedings.neurips.cc

The large communication overhead has imposed a bottleneck on the performance of
distributed Stochastic Gradient Descent (SGD) for training deep neural networks. Previous …

被引用次数：223 相关文章所有 6 个版本

[PDF] neurips.cc

Sparsified SGD with memory

SU Stich, JB Cordonnier… - Advances in neural …, 2018 - proceedings.neurips.cc

Huge scale machine learning problems are nowadays tackled by distributed optimization
algorithms, ie algorithms that leverage the compute power of many devices for training. The …

被引用次数：804 相关文章所有 10 个版本

[PDF] ieee.org

rTop-k: A Statistical Estimation Approach to Distributed SGD

LP Barnes, HA Inan, B Isik… - IEEE Journal on Selected …, 2020 - ieeexplore.ieee.org

The large communication cost for exchanging gradients between different nodes
significantly limits the scalability of distributed training for large-scale learning models …

被引用次数：57 相关文章所有 4 个版本

[PDF] mlr.press

Doublesqueeze: Parallel stochastic gradient descent with double-pass error-compensated compression

H Tang, C Yu, X Lian, T Zhang… - … Conference on Machine …, 2019 - proceedings.mlr.press

A standard approach in large scale machine learning is distributed stochastic gradient
training, which requires the computation of aggregated stochastic gradients over multiple …

被引用次数：244 相关文章所有 8 个版本

[PDF] neurips.cc

Gradient sparsification for communication-efficient distributed optimization

J Wangni, J Wang, J Liu… - Advances in Neural …, 2018 - proceedings.neurips.cc

Modern large-scale machine learning applications require stochastic optimization
algorithms to be implemented on distributed computational architectures. A key bottleneck is …

被引用次数：636 相关文章所有 12 个版本

[PDF] ieee.org

SQuARM-SGD: Communication-efficient momentum SGD for decentralized optimization

N Singh, D Data, J George… - IEEE Journal on Selected …, 2021 - ieeexplore.ieee.org

In this paper, we propose and analyze SQuARM-SGD, a communication-efficient algorithm
for decentralized training of large-scale machine learning models over a network. In …

被引用次数：52 相关文章所有 8 个版本

高级搜索

QQ 群

Qsparse-local-SGD: Distributed SGD with quantization, sparsification and local computations

Qsparse-local-SGD: Distributed SGD with quantization, sparsification, and local computations

[PDF][PDF] A convergence analysis of distributed SGD with communication-efficient gradient sparsification.

Ac-sgd: Adaptively compressed sgd for communication-efficient distributed learning

A linear speedup analysis of distributed deep learning with sparse and quantized communication

Sparsified SGD with memory

rTop-k: A Statistical Estimation Approach to Distributed SGD

Doublesqueeze: Parallel stochastic gradient descent with double-pass error-compensated compression

Gradient sparsification for communication-efficient distributed optimization

SQuARM-SGD: Communication-efficient momentum SGD for decentralized optimization

相关搜索

引用