T Zhu, F He, K Chen, M Song… - … Conference on Machine …, 2023 - proceedings.mlr.press
Decentralized stochastic gradient descent (D-SGD) allows collaborative learning on massive devices simultaneously without the control of a central server. However, existing …
Local SGD is a communication-efficient variant of SGD for large-scale training, where multiple GPUs perform SGD independently and average the model parameters periodically …
In distributed deep learning with data parallelism, synchronizing gradients at each training step can cause a huge communication overhead, especially when many nodes work …
Motivated by learning of correlated equilibria in non-cooperative games, we perform a large deviations analysis of a regret minimizing stochastic approximation algorithm. The regret …
Deep learning has achieved remarkable success in recent years, yet training neural networks often involves a delicate combination of guesswork and hyperparameter tuning. A …