相关文章- 学术资源搜索

Just one byte (per gradient): A note on low-bandwidth decentralized language model finetuning using shared randomness

E Zelikman, Q Huang, P Liang, N Haber… - arXiv preprint arXiv …, 2023 - arxiv.org

Language model training in distributed settings is limited by the communication cost of
gradient exchanges. In this short note, we extend recent work from Malladi et al.(2023) …

被引用次数：10 相关文章所有 2 个版本

[PDF] arxiv.org

CO2: Efficient distributed training with full communication-computation overlap

W Sun, Z Qin, W Sun, S Li, D Li, X Shen, Y Qiao… - arXiv preprint arXiv …, 2024 - arxiv.org

The fundamental success of large language models hinges upon the efficacious
implementation of large-scale distributed training techniques. Nevertheless, building a vast …

被引用次数：5 相关文章所有 3 个版本

[PDF] arxiv.org

Asynchronous Local-SGD Training for Language Modeling

B Liu, R Chhaparia, A Douillard, S Kale… - arXiv preprint arXiv …, 2024 - arxiv.org

Local stochastic gradient descent (Local-SGD), also referred to as federated averaging, is
an approach to distributed optimization where each device performs more than one SGD …

被引用次数：4 相关文章所有 3 个版本

[PDF] arxiv.org

Diloco: Distributed low-communication training of language models

A Douillard, Q Feng, AA Rusu, R Chhaparia… - arXiv preprint arXiv …, 2023 - arxiv.org

Large language models (LLM) have become a critical component in many applications of
machine learning. However, standard approaches to training LLM require a large number of …

被引用次数：20 相关文章所有 3 个版本

[PDF] arxiv.org

Federated full-parameter tuning of billion-sized language models with communication cost under 18 kilobytes

Z Qin, D Chen, B Qian, B Ding, Y Li, S Deng - arXiv preprint arXiv …, 2023 - arxiv.org

Pre-trained large language models (LLMs) require fine-tuning to improve their
responsiveness to natural language instructions. Federated learning (FL) offers a way to …

被引用次数：28 相关文章所有 3 个版本

[PDF] arxiv.org

SLoRA: Federated parameter efficient fine-tuning of language models

S Babakniya, AR Elkordy, YH Ezzeldin, Q Liu… - arXiv preprint arXiv …, 2023 - arxiv.org

Transfer learning via fine-tuning pre-trained transformer models has gained significant
success in delivering state-of-the-art results across various NLP tasks. In the absence of …

被引用次数：50 相关文章所有 3 个版本

[PDF] neurips.cc

Distributed inference and fine-tuning of large language models over the internet

A Borzunov, M Ryabinin… - Advances in …, 2024 - proceedings.neurips.cc

Large language models (LLMs) are useful in many NLP tasks and become more capable
with size, with the best open-source models having over 50 billion parameters. However …

被引用次数：34 相关文章所有 6 个版本

[PDF] arxiv.org

Fast distributed inference serving for large language models

B Wu, Y Zhong, Z Zhang, S Liu, F Liu, Y Sun… - arXiv preprint arXiv …, 2023 - arxiv.org

Large language models (LLMs) power a new generation of interactive AI applications
exemplified by ChatGPT. The interactive nature of these applications demands low latency …

被引用次数：78 相关文章所有 2 个版本

[PDF] arxiv.org

A better alternative to error feedback for communication-efficient distributed learning

S Horváth, P Richtárik - arXiv preprint arXiv:2006.11077, 2020 - arxiv.org

Modern large-scale machine learning applications require stochastic optimization
algorithms to be implemented on distributed compute systems. A key bottleneck of such …

被引用次数：65 相关文章所有 5 个版本

[PDF] arxiv.org

On the convergence of zeroth-order federated tuning for large language models

Z Ling, D Chen, L Yao, Y Li, Y Shen - Proceedings of the 30th ACM …, 2024 - dl.acm.org

The confluence of Federated Learning (FL) and Large Language Models (LLMs) is ushering
in a new era in privacy-preserving natural language processing. However, the intensive …

被引用次数：14 相关文章所有 2 个版本

高级搜索

QQ 群

Just one byte (per gradient): A note on low-bandwidth decentralized language model finetuning using shared randomness

CO2: Efficient distributed training with full communication-computation overlap

Asynchronous Local-SGD Training for Language Modeling

Diloco: Distributed low-communication training of language models

Federated full-parameter tuning of billion-sized language models with communication cost under 18 kilobytes

SLoRA: Federated parameter efficient fine-tuning of language models

Distributed inference and fine-tuning of large language models over the internet

Fast distributed inference serving for large language models

A better alternative to error feedback for communication-efficient distributed learning

On the convergence of zeroth-order federated tuning for large language models

相关搜索

引用