1-bit adam: Communication efficient large-scale training with adam’s convergence speed

J Kaddour, J Harris, M Mozes, H Bradley… - arXiv preprint arXiv …, 2023 - arxiv.org

Large Language Models (LLMs) went from non-existent to ubiquitous in the machine
learning discourse within a few years. Due to the fast pace of the field, it is difficult to identify …

被引用次数：354 相关文章所有 3 个版本

[PDF] ieee.org

Efficient acceleration of deep learning inference on resource-constrained edge devices: A review

MMH Shuvo, SK Islam, J Cheng… - Proceedings of the …, 2022 - ieeexplore.ieee.org

Successful integration of deep neural networks (DNNs) or deep learning (DL) has resulted
in breakthroughs in many areas. However, deploying these highly accurate models for data …

被引用次数：100 相关文章所有 5 个版本

[PDF] thecvf.com

Scaling vision transformers

X Zhai, A Kolesnikov, N Houlsby… - Proceedings of the …, 2022 - openaccess.thecvf.com

Attention-based neural networks such as the Vision Transformer (ViT) have recently attained
state-of-the-art results on many computer vision benchmarks. Scale is a primary ingredient …

被引用次数：1111 相关文章所有 9 个版本

[PDF] mlr.press

Cocktailsgd: Fine-tuning foundation models over 500mbps networks

J Wang, Y Lu, B Yuan, B Chen… - International …, 2023 - proceedings.mlr.press

Distributed training of foundation models, especially large language models (LLMs), is
communication-intensive and so has heavily relied on centralized data centers with fast …

被引用次数：28 相关文章所有 4 个版本

[PDF] springer.com

The effect of choosing optimizer algorithms to improve computer vision tasks: a comparative study

E Hassan, MY Shams, NA Hikal, S Elmougy - Multimedia Tools and …, 2023 - Springer

Optimization algorithms are used to improve model accuracy. The optimization process
undergoes multiple cycles until convergence. A variety of optimization strategies have been …

被引用次数：114 相关文章所有 12 个版本

[PDF] mlr.press

Communication-efficient adaptive federated learning

Y Wang, L Lin, J Chen - International Conference on …, 2022 - proceedings.mlr.press

Federated learning is a machine learning training paradigm that enables clients to jointly
train models without sharing their own localized data. However, the implementation of …

被引用次数：77 相关文章所有 4 个版本

[PDF] researchgate.net

[PDF][PDF] Efficient large language models: A survey

Z Wan, X Wang, C Liu, S Alam, Y Zheng… - arXiv preprint arXiv …, 2023 - researchgate.net

Abstract Large Language Models (LLMs) have demonstrated remarkable capabilities in
important tasks such as natural language understanding, language generation, and …

被引用次数：81 相关文章所有 7 个版本

[PDF] jmlr.org

Compute-efficient deep learning: Algorithmic trends and opportunities

BR Bartoldson, B Kailkhura, D Blalock - Journal of Machine Learning …, 2023 - jmlr.org

Although deep learning has made great progress in recent years, the exploding economic
and environmental costs of training neural networks are becoming unsustainable. To …

被引用次数：40 相关文章所有 4 个版本

[PDF] arxiv.org

Communication-efficient distributed deep learning: A comprehensive survey

Z Tang, S Shi, W Wang, B Li, X Chu - arXiv preprint arXiv:2003.06307, 2020 - arxiv.org

Distributed deep learning (DL) has become prevalent in recent years to reduce training time
by leveraging multiple computing devices (eg, GPUs/TPUs) due to larger models and …

被引用次数：144 相关文章所有 4 个版本

[PDF] arxiv.org

On efficient training of large-scale deep learning models: A literature review

L Shen, Y Sun, Z Yu, L Ding, X Tian, D Tao - arXiv preprint arXiv …, 2023 - arxiv.org

The field of deep learning has witnessed significant progress, particularly in computer vision
(CV), natural language processing (NLP), and speech. The use of large-scale models …

被引用次数：28 相关文章所有 2 个版本

高级搜索

QQ 群