所有版本 - 学术资源搜索

文章

学术资源搜索

获得 2 条结果（用时0.03秒）

Just one byte (per gradient): A note on low-bandwidth decentralized language model finetuning using shared randomness

E Zelikman, Q Huang, P Liang, N Haber… - arXiv preprint arXiv …, 2023 - arxiv.org

Language model training in distributed settings is limited by the communication cost of
gradient exchanges. In this short note, we extend recent work from Malladi et al.(2023) …

被引用次数：10 相关文章

Just One Byte (per gradient): A Note on Low-Bandwidth Decentralized Language Model Finetuning Using Shared Randomness

E Zelikman, Q Huang, P Liang, N Haber… - arXiv e …, 2023 - ui.adsabs.harvard.edu

Abstract Language model training in distributed settings is limited by the communication cost
of gradient exchanges. In this short note, we extend recent work from Malladi et al.(2023) …

高级搜索

QQ 群

Just one byte (per gradient): A note on low-bandwidth decentralized language model finetuning using shared randomness

Just One Byte (per gradient): A Note on Low-Bandwidth Decentralized Language Model Finetuning Using Shared Randomness

引用