Straggler mitigation in distributed matrix multiplication: Fundamental limits and optimal coding

Q Yu, MA Maddah-Ali… - IEEE Transactions on …, 2020 - ieeexplore.ieee.org
We consider the problem of massive matrix multiplication, which underlies many data
analytic applications, in a large-scale distributed system comprising a group of worker …

Entangled polynomial codes for secure, private, and batch distributed matrix multiplication: Breaking the" cubic" barrier

Q Yu, AS Avestimehr - 2020 IEEE International Symposium on …, 2020 - ieeexplore.ieee.org
In distributed matrix multiplication, a common scenario is to assign each worker a fraction of
the multiplication task, by partitioning the input matrices into smaller submatrices. In …

Coded computing for resilient, secure, and privacy-preserving distributed matrix multiplication

Q Yu, AS Avestimehr - IEEE Transactions on Communications, 2020 - ieeexplore.ieee.org
Coded computing is a new framework to address fundamental issues in large scale
distributed computing, by injecting structured randomness and redundancy. We first provide …

Sparse random khatri-rao product codes for distributed matrix multiplication

R Ji, A Heidarzadeh… - 2022 IEEE Information …, 2022 - ieeexplore.ieee.org
We introduce two generalizations to the paradigm of using Random Khatri-Rao Product
(RKRP) codes for distributed matrix multiplication. We first introduce a class of codes called …

Lightweight projective derivative codes for compressed asynchronous gradient descent

PJ Soto, I Ilmer, H Guan, J Li - International Conference on …, 2022 - proceedings.mlr.press
Coded distributed computation has become common practice for performing gradient
descent on large datasets to mitigate stragglers and other faults. This paper proposes a …

Coded Distributed Multiplication for Matrices of Different Sparsity Levels

JA Lin, YC Huang, MC Lee… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
The problem of computing batches of matrix multiplications in distributed computing systems
with stragglers is studied. Unlike existing works in the literature, the matrices in a batch are …

Random Alloy Codes and the Fundamental Limits of Coded Distributed Tensors

P Soto - 2024 IEEE Information Theory Workshop (ITW), 2024 - ieeexplore.ieee.org
Tensors are a fundamental operation in distributed computing, eg, machine learning, that
are commonly distributed into multiple parallel tasks for large datasets. Stragglers and other …

Locally Random Alloy Codes with Channel Coding Theorems for Distributed Matrix Multiplication

P Soto, H Guan, J Li - arXiv preprint arXiv:2202.03469, 2022 - arxiv.org
Matrix multiplication is a fundamental operation in machine learning and is commonly
distributed into multiple parallel tasks for large datasets. Stragglers and other failures can …

Coded Distributed Function Computation

PJ Soto - 2022 - search.proquest.com
A ubiquitous problem in computer science research is the optimization of computation on
large data sets. Such computations are usually too large to be performed on one machine …