相关文章- 学术资源搜索

Double quantization for communication-efficient distributed optimization

Y Yu, J Wu, L Huang - Advances in Neural Information …, 2019 - proceedings.neurips.cc

Modern distributed training of machine learning models often suffers from high
communication overhead for synchronizing stochastic gradients and model parameters. In …

被引用次数：59 相关文章所有 8 个版本

[PDF] neurips.cc

A linear speedup analysis of distributed deep learning with sparse and quantized communication

P Jiang, G Agrawal - Advances in Neural Information …, 2018 - proceedings.neurips.cc

The large communication overhead has imposed a bottleneck on the performance of
distributed Stochastic Gradient Descent (SGD) for training deep neural networks. Previous …

被引用次数：219 相关文章所有 8 个版本

[PDF] neurips.cc

Communication-efficient distributed learning via lazily aggregated quantized gradients

J Sun, T Chen, G Giannakis… - Advances in Neural …, 2019 - proceedings.neurips.cc

The present paper develops a novel aggregated gradient approach for distributed machine
learning that adaptively compresses the gradient communication. The key idea is to first …

被引用次数：108 相关文章所有 8 个版本

[PDF] epfl.ch

Convex optimization using sparsified stochastic gradient descent with memory

JB Cordonnier - 2018 - infoscience.epfl.ch

The interest for distributed stochastic optimization has raised to train complex Machine
Learning models with more data on distributed systems. Increasing the computation power …

被引用次数：18 相关文章

[PDF] arxiv.org

Sparse communication for training deep networks

NF Eghlidi, M Jaggi - arXiv preprint arXiv:2009.09271, 2020 - arxiv.org

Synchronous stochastic gradient descent (SGD) is the most common method used for
distributed training of deep learning models. In this algorithm, each worker shares its local …

被引用次数：16 相关文章

[PDF] arxiv.org

Bagua: scaling up distributed learning with system relaxations

S Gan, X Lian, R Wang, J Chang, C Liu, H Shi… - arXiv preprint arXiv …, 2021 - arxiv.org

Recent years have witnessed a growing list of systems for distributed data-parallel training.
Existing systems largely fit into two paradigms, ie, parameter server and MPI-style collective …

被引用次数：26 相关文章所有 8 个版本

[PDF] mlr.press

Error compensated quantized SGD and its applications to large-scale distributed optimization

J Wu, W Huang, J Huang… - … Conference on Machine …, 2018 - proceedings.mlr.press

Large-scale distributed optimization is of great importance in various applications. For data-
parallel based distributed learning, the inter-node gradient communication often becomes …

被引用次数：251 相关文章所有 11 个版本

[PDF] arxiv.org

Layer-wise adaptive gradient sparsification for distributed deep learning with convergence guarantees

S Shi, Z Tang, Q Wang, K Zhao, X Chu - arXiv preprint arXiv:1911.08727, 2019 - arxiv.org

To reduce the long training time of large deep neural network (DNN) models, distributed
synchronous stochastic gradient descent (S-SGD) is commonly used on a cluster of workers …

被引用次数：24 相关文章所有 8 个版本

[PDF] thecvf.com

JointSQ: Joint Sparsification-Quantization for Distributed Learning

W Xie, H Li, J Ma, Y Li, J Lei, D Liu… - Proceedings of the …, 2024 - openaccess.thecvf.com

Gradient sparsification and quantization offer a promising prospect to alleviate the
communication overhead problem in distributed learning. However direct combination of the …

Dynamic layer-wise sparsification for distributed deep learning

H Zhang, T Wu, Z Ma, F Li, J Liu - Future Generation Computer Systems, 2023 - Elsevier

Distributed stochastic gradient descent (SGD) algorithms are becoming popular in speeding
up deep learning model training by employing multiple computational devices (named …

被引用次数：2 相关文章所有 2 个版本

高级搜索

QQ 群

Double quantization for communication-efficient distributed optimization

A linear speedup analysis of distributed deep learning with sparse and quantized communication

Communication-efficient distributed learning via lazily aggregated quantized gradients

Convex optimization using sparsified stochastic gradient descent with memory

Sparse communication for training deep networks

Bagua: scaling up distributed learning with system relaxations

Error compensated quantized SGD and its applications to large-scale distributed optimization

Layer-wise adaptive gradient sparsification for distributed deep learning with convergence guarantees

JointSQ: Joint Sparsification-Quantization for Distributed Learning

Dynamic layer-wise sparsification for distributed deep learning

相关搜索

引用