Coded computing: Mitigating fundamental bottlenecks in large-scale distributed computing and machine learning

S Li, S Avestimehr - Foundations and Trends® in …, 2020 - nowpublishers.com
We introduce the concept of “coded computing”, a novel computing paradigm that utilizes
coding theory to effectively inject and leverage data/computation redundancy to mitigate …

Stochastic gradient coding for straggler mitigation in distributed learning

R Bitar, M Wootters… - IEEE Journal on Selected …, 2020 - ieeexplore.ieee.org
We consider distributed gradient descent in the presence of stragglers. Recent work on
gradient coding and approximate gradient coding have shown how to add redundancy in …

Coded distributed computing with partial recovery

E Ozfatura, S Ulukus, D Gündüz - IEEE Transactions on …, 2021 - ieeexplore.ieee.org
Coded computation techniques provide robustness against straggling workers in distributed
computing. However, most of the existing schemes require exact provisioning of the …

Straggler-aware distributed learning: Communication–computation latency trade-off

E Ozfatura, S Ulukus, D Gündüz - Entropy, 2020 - mdpi.com
When gradient descent (GD) is scaled to many parallel workers for large-scale machine
learning applications, its per-iteration computation time is limited by straggling workers …

Berrut approximated coded computing: Straggler resistance beyond polynomial computing

T Jahani-Nezhad… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org
One of the major challenges in using distributed learning to train complicated models with
large data sets is to deal with stragglers effect. As a solution, coded computation has been …

Codedsketch: A coding scheme for distributed computation of approximated matrix multiplication

T Jahani-Nezhad… - IEEE Transactions on …, 2021 - ieeexplore.ieee.org
In this paper, we propose CodedSketch, as a distributed straggler-resistant scheme to
compute an approximation of the multiplication of two massive matrices. The objective is to …

Slow and stale gradients can win the race

S Dutta, J Wang, G Joshi - IEEE Journal on Selected Areas in …, 2021 - ieeexplore.ieee.org
Distributed Stochastic Gradient Descent (SGD) when run in a synchronous manner, suffers
from delays in runtime as it waits for the slowest workers (stragglers). Asynchronous …

Straggler-resilient personalized federated learning

I Tziotis, Z Shen, R Pedarsani, H Hassani… - arXiv preprint arXiv …, 2022 - arxiv.org
Federated Learning is an emerging learning paradigm that allows training models from
samples distributed across a large network of clients while respecting privacy and …

Approximate gradient coding with optimal decoding

M Glasgow, M Wootters - IEEE Journal on Selected Areas in …, 2021 - ieeexplore.ieee.org
Gradient codes use data replication to mitigate the effect of straggling machines in
distributed machine learning. Approximate gradient codes consider codes where the data …

ϵ-Approximate Coded Matrix Multiplication Is Nearly Twice as Efficient as Exact Multiplication

H Jeong, A Devulapalli, VR Cadambe… - IEEE Journal on …, 2021 - ieeexplore.ieee.org
We study coded distributed matrix multiplication from an approximate recovery viewpoint.
We consider a system of computation nodes where each node stores of each multiplicand …