E Zelikman, Q Huang, P Liang, N Haber… - arXiv e …, 2023 - ui.adsabs.harvard.edu
Abstract Language model training in distributed settings is limited by the communication cost
of gradient exchanges. In this short note, we extend recent work from Malladi et al.(2023) …