Straggler mitigation in distributed optimization through data encoding

K Yang, T Jiang, Y Shi, Z Ding - IEEE transactions on wireless …, 2020 - ieeexplore.ieee.org

The stringent requirements for low-latency and privacy of the emerging high-stake
applications with intelligent devices such as drones and smart vehicles make the cloud …

被引用次数：916 相关文章所有 13 个版本

[PDF] mlr.press

Lagrange coded computing: Optimal design for resiliency, security, and privacy

Q Yu, S Li, N Raviv, SMM Kalan… - The 22nd …, 2019 - proceedings.mlr.press

We consider a scenario involving computations over a massive dataset stored distributedly
across multiple workers, which is at the core of distributed learning algorithms. We propose …

被引用次数：409 相关文章所有 11 个版本

[PDF] arxiv.org

Towards demystifying serverless machine learning training

J Jiang, S Gan, Y Liu, F Wang, G Alonso… - Proceedings of the …, 2021 - dl.acm.org

The appeal of serverless (FaaS) has triggered a growing interest on how to use it in data-
intensive applications such as ETL, query processing, or machine learning (ML). Several …

被引用次数：118 相关文章所有 9 个版本

[PDF] neurips.cc

Short-dot: Computing large linear transforms distributedly using coded short dot products

S Dutta, V Cadambe, P Grover - Advances In Neural …, 2016 - proceedings.neurips.cc

Faced with saturation of Moore's law and increasing size and dimension of data, system
designers have increasingly resorted to parallel and distributed computing to reduce …

被引用次数：411 相关文章所有 11 个版本

[PDF] ieee.org

On the optimal recovery threshold of coded matrix multiplication

S Dutta, M Fahim, F Haddadpour… - IEEE Transactions …, 2019 - ieeexplore.ieee.org

We provide novel coded computation strategies for distributed matrix-matrix products that
outperform the recent “Polynomial code” constructions in recovery threshold, ie, the required …

被引用次数：274 相关文章所有 9 个版本

[PDF] ieee.org

Coded computation over heterogeneous clusters

A Reisizadeh, S Prakash, R Pedarsani… - IEEE Transactions …, 2019 - ieeexplore.ieee.org

In large-scale distributed computing clusters, such as Amazon EC2, there are several types
of “system noise” that can result in major degradation of performance: system failures …

被引用次数：278 相关文章所有 12 个版本

[PDF] arxiv.org

Coded computing for low-latency federated learning over wireless edge networks

S Prakash, S Dhakal, MR Akdeniz… - IEEE Journal on …, 2020 - ieeexplore.ieee.org

Federated learning enables training a global model from data located at the client nodes,
without data sharing and moving client data to a centralized server. Performance of …

被引用次数：102 相关文章所有 7 个版本

[PDF] mlr.press

Gradient coding from cyclic MDS codes and expander graphs

N Raviv, R Tandon, A Dimakis… - … Conference on Machine …, 2018 - proceedings.mlr.press

Gradient coding is a technique for straggler mitigation in distributed learning. In this paper
we design novel gradient codes using tools from classical coding theory, namely, cyclic …

被引用次数：206 相关文章所有 14 个版本

[PDF] arxiv.org

Motivating workers in federated learning: A stackelberg game perspective

Y Sarikaya, O Ercetin - IEEE Networking Letters, 2019 - ieeexplore.ieee.org

Due to the large size of the training data, distributed learning approaches such as federated
learning have gained attention recently. However, the convergence rate of distributed …

被引用次数：184 相关文章所有 6 个版本

[PDF] mlr.press

Slow and stale gradients can win the race: Error-runtime trade-offs in distributed SGD

S Dutta, G Joshi, S Ghosh, P Dube… - International …, 2018 - proceedings.mlr.press

Abstract Distributed Stochastic Gradient Descent (SGD) when run in a synchronous manner,
suffers from delays in waiting for the slowest learners (stragglers). Asynchronous methods …

被引用次数：185 相关文章所有 12 个版本

高级搜索

QQ 群