Coded distributed computing with partial recovery

E Ozfatura, S Ulukus, D Gündüz - IEEE Transactions on …, 2021 - ieeexplore.ieee.org
Coded computation techniques provide robustness against straggling workers in distributed
computing. However, most of the existing schemes require exact provisioning of the …

Straggler-aware distributed learning: Communication–computation latency trade-off

E Ozfatura, S Ulukus, D Gündüz - Entropy, 2020 - mdpi.com
When gradient descent (GD) is scaled to many parallel workers for large-scale machine
learning applications, its per-iteration computation time is limited by straggling workers …

Chebyshev polynomial codes: Task entanglement-based coding for distributed matrix multiplication

S Hong, H Yang, Y Yoon, T Cho… - … Conference on Machine …, 2021 - proceedings.mlr.press
Distributed computing has been a prominent solution to efficiently process massive datasets
in parallel. However, the existence of stragglers is one of the major concerns that slows …

ApproxIFER: A model-agnostic approach to resilient and robust prediction serving systems

M Soleymani, RE Ali, H Mahdavifar… - Proceedings of the AAAI …, 2022 - ojs.aaai.org
Due to the surge of cloud-assisted AI services, the problem of designing resilient prediction
serving systems that can effectively cope with stragglers and minimize response delays has …

Hierarchical coded gradient aggregation for learning at the edge

S Prakash, A Reisizadeh, R Pedarsani… - 2020 IEEE …, 2020 - ieeexplore.ieee.org
Client devices at the edge are generating increasingly large amounts of rich data suitable for
learning powerful statistical models. However, privacy concerns and heavy communication …

Group-wise Verifiable Coded Computing under Byzantine Attacks and Stragglers

S Hong, H Yang, Y Yoon, J Lee - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
Distributed computing has emerged as a promising solution for accelerating machine
learning training processes on large-scale datasets by leveraging the parallel processing …

Accelerating Distributed Matrix Multiplication with 4-Dimensional Polynomial Codes

R Nissim, O Schwartz - SIAM Conference on Applied and Computational …, 2023 - SIAM
A single straggler worker may delay an entire distributed system. The state-of-the-art
strategies for mitigating delays in large-scale distributed matrix multiplication are polynomial …

Coded sequential matrix multiplication for straggler mitigation

NK Muralee Krishnan, S Hosseini… - Advances in Neural …, 2020 - proceedings.neurips.cc
In this work, we consider a sequence of $ J $ matrix multiplication jobs which needs to be
distributed by a master across multiple worker nodes. For $ i\in\{1, 2,\ldots, J\} $, job-$ i …

[PDF][PDF] Fault-Tolerant Parallel Integer Multiplication

R Nissim, O Schwartz, Y Spiizer - … of the 36th ACM Symposium on …, 2024 - cs.huji.ac.il
Long integer multiplication algorithms are fundamental computational kernels in numerous
applications, ranging from cryptographic systems to neural networks. They serve as …

[PDF][PDF] Coded Sequential Matrix Multiplication for Straggler Mitigation.

MN Krishnan, E Hosseini… - IEEE J. Sel. Areas Inf …, 2021 - proceedings.neurips.cc
In this work, we consider a sequence of J matrix multiplication jobs which needs to be
distributed by a master across multiple worker nodes. For i∈{1, 2,..., J}, job-i begins in round …