A comprehensive survey on coded distributed computing: Fundamentals, challenges, and networking applications

JS Ng, WYB Lim, NC Luong, Z Xiong… - … Surveys & Tutorials, 2021 - ieeexplore.ieee.org
Distributed computing has become a common approach for large-scale computation tasks
due to benefits such as high reliability, scalability, computation speed, and cost …

Private retrieval, computing, and learning: Recent progress and future challenges

S Ulukus, S Avestimehr, M Gastpar… - IEEE Journal on …, 2022 - ieeexplore.ieee.org
Most of our lives are conducted in the cyberspace. The human notion of privacy translates
into a cyber notion of privacy on many functions that take place in the cyberspace. This …

Fedpaq: A communication-efficient federated learning method with periodic averaging and quantization

A Reisizadeh, A Mokhtari, H Hassani… - International …, 2020 - proceedings.mlr.press
Federated learning is a distributed framework according to which a model is trained over a
set of devices, while keeping data localized. This framework faces several systems-oriented …

Lagrange coded computing: Optimal design for resiliency, security, and privacy

Q Yu, S Li, N Raviv, SMM Kalan… - The 22nd …, 2019 - proceedings.mlr.press
We consider a scenario involving computations over a massive dataset stored distributedly
across multiple workers, which is at the core of distributed learning algorithms. We propose …

Gradient coding: Avoiding stragglers in distributed learning

R Tandon, Q Lei, AG Dimakis… - … on Machine Learning, 2017 - proceedings.mlr.press
We propose a novel coding theoretic framework for mitigating stragglers in distributed
learning. We show how carefully replicating data blocks and coding across gradients can …

Speeding up distributed machine learning using codes

K Lee, M Lam, R Pedarsani… - IEEE Transactions …, 2017 - ieeexplore.ieee.org
Codes are widely used in many engineering applications to offer robustness against noise.
In large-scale systems, there are several types of noise that can affect the performance of …

Straggler mitigation in distributed matrix multiplication: Fundamental limits and optimal coding

Q Yu, MA Maddah-Ali… - IEEE Transactions on …, 2020 - ieeexplore.ieee.org
We consider the problem of massive matrix multiplication, which underlies many data
analytic applications, in a large-scale distributed system comprising a group of worker …

A fundamental tradeoff between computation and communication in distributed computing

S Li, MA Maddah-Ali, Q Yu… - IEEE Transactions on …, 2017 - ieeexplore.ieee.org
How can we optimally trade extra computing power to reduce the communication load in
distributed computing? We answer this question by characterizing a fundamental tradeoff …

On the optimal recovery threshold of coded matrix multiplication

S Dutta, M Fahim, F Haddadpour… - IEEE Transactions …, 2019 - ieeexplore.ieee.org
We provide novel coded computation strategies for distributed matrix-matrix products that
outperform the recent “Polynomial code” constructions in recovery threshold, ie, the required …

Draco: Byzantine-resilient distributed training via redundant gradients

L Chen, H Wang, Z Charles… - … on Machine Learning, 2018 - proceedings.mlr.press
Distributed model training is vulnerable to byzantine system failures and adversarial
compute nodes, ie, nodes that use malicious updates to corrupt the global model stored at a …