Adaptive gradient quantization for data-parallel sgd

J Yao, S Zhang, Y Yao, F Wang, J Ma… - … on Knowledge and …, 2022 - ieeexplore.ieee.org

Influenced by the great success of deep learning via cloud computing and the rapid
development of edge chips, research in artificial intelligence (AI) has shifted to both of the …

被引用次数：73 相关文章所有 5 个版本

[PDF] arxiv.org

A survey of quantization methods for efficient neural network inference

A Gholami, S Kim, Z Dong, Z Yao… - Low-Power Computer …, 2022 - taylorfrancis.com

This chapter provides approaches to the problem of quantizing the numerical values in deep
Neural Network computations, covering the advantages/disadvantages of current methods …

被引用次数：1022 相关文章所有 4 个版本

[PDF] neurips.cc

Nerv: Neural representations for videos

H Chen, B He, H Wang, Y Ren… - Advances in Neural …, 2021 - proceedings.neurips.cc

We propose a novel neural representation for videos (NeRV) which encodes videos in
neural networks. Unlike conventional representations that treat videos as frame sequences …

被引用次数：168 相关文章所有 7 个版本

Communication compression techniques in distributed deep learning: A survey

Z Wang, M Wen, Y Xu, Y Zhou, JH Wang… - Journal of Systems …, 2023 - Elsevier

Nowadays, the training data and neural network models are getting increasingly large. The
training time of deep learning will become unbearably long on a single machine. To reduce …

被引用次数：7 相关文章所有 3 个版本

[PDF] arxiv.org

Adaptive quantization of model updates for communication-efficient federated learning

D Jhunjhunwala, A Gadhikar, G Joshi… - ICASSP 2021-2021 …, 2021 - ieeexplore.ieee.org

Communication of model updates between client nodes and the central aggregating server
is a major bottleneck in federated learning, especially in bandwidth-limited settings and high …

被引用次数：103 相关文章所有 11 个版本

[PDF] arxiv.org

What do we mean by generalization in federated learning?

H Yuan, W Morningstar, L Ning, K Singhal - arXiv preprint arXiv …, 2021 - arxiv.org

Federated learning data is drawn from a distribution of distributions: clients are drawn from a
meta-distribution, and their data are drawn from local data distributions. Thus generalization …

被引用次数：62 相关文章所有 5 个版本

[PDF] arxiv.org

Variance reduction is an antidote to byzantines: Better rates, weaker assumptions and communication compression as a cherry on the top

E Gorbunov, S Horváth, P Richtárik, G Gidel - arXiv preprint arXiv …, 2022 - arxiv.org

Byzantine-robustness has been gaining a lot of attention due to the growth of the interest in
collaborative and federated learning. However, many fruitful directions, such as the usage of …

被引用次数：35 相关文章所有 7 个版本

Ac-sgd: Adaptively compressed sgd for communication-efficient distributed learning

G Yan, T Li, SL Huang, T Lan… - IEEE Journal on Selected …, 2022 - ieeexplore.ieee.org

Gradient compression (eg, gradient quantization and gradient sparsification) is a core
technique in reducing communication costs in distributed learning systems. The recent trend …

被引用次数：22 相关文章所有 3 个版本

[PDF] arxiv.org

Optimus-CC: Efficient large NLP model training with 3D parallelism aware communication compression

J Song, J Yim, J Jung, H Jang, HJ Kim, Y Kim… - Proceedings of the 28th …, 2023 - dl.acm.org

In training of modern large natural language processing (NLP) models, it has become a
common practice to split models using 3D parallelism to multiple GPUs. Such technique …

被引用次数：20 相关文章所有 5 个版本

[PDF] neurips.cc

Fast optimal locally private mean estimation via random projections

H Asi, V Feldman, J Nelson… - Advances in Neural …, 2024 - proceedings.neurips.cc

We study the problem of locally private mean estimation of high-dimensional vectors in the
Euclidean ball. Existing algorithms for this problem either incur sub-optimal error or have …

被引用次数：9 相关文章所有 8 个版本

高级搜索

QQ 群