A comprehensive survey on coded distributed computing: Fundamentals, challenges, and networking applications

JS Ng, WYB Lim, NC Luong, Z Xiong… - … Surveys & Tutorials, 2021 - ieeexplore.ieee.org
Distributed computing has become a common approach for large-scale computation tasks
due to benefits such as high reliability, scalability, computation speed, and cost …

Polynomial codes: an optimal design for high-dimensional coded matrix multiplication

Q Yu, M Maddah-Ali… - Advances in Neural …, 2017 - proceedings.neurips.cc
We consider a large-scale matrix multiplication problem where the computation is carried
out using a distributed system with a master node and multiple worker nodes, where each …

Straggler mitigation in distributed matrix multiplication: Fundamental limits and optimal coding

Q Yu, MA Maddah-Ali… - IEEE Transactions on …, 2020 - ieeexplore.ieee.org
We consider the problem of massive matrix multiplication, which underlies many data
analytic applications, in a large-scale distributed system comprising a group of worker …

Cluster frameworks for efficient scheduling and resource allocation in data center networks: A survey

K Wang, Q Zhou, S Guo, J Luo - IEEE Communications Surveys …, 2018 - ieeexplore.ieee.org
Data centers are widely used for big data analytics, which often involve data-parallel jobs,
including query and web service. Meanwhile, cluster frameworks are rapidly developed for …

Short-dot: Computing large linear transforms distributedly using coded short dot products

S Dutta, V Cadambe, P Grover - Advances In Neural …, 2016 - proceedings.neurips.cc
Faced with saturation of Moore's law and increasing size and dimension of data, system
designers have increasingly resorted to parallel and distributed computing to reduce …

On the optimal recovery threshold of coded matrix multiplication

S Dutta, M Fahim, F Haddadpour… - IEEE Transactions …, 2019 - ieeexplore.ieee.org
We provide novel coded computation strategies for distributed matrix-matrix products that
outperform the recent “Polynomial code” constructions in recovery threshold, ie, the required …

Draco: Byzantine-resilient distributed training via redundant gradients

L Chen, H Wang, Z Charles… - … on Machine Learning, 2018 - proceedings.mlr.press
Distributed model training is vulnerable to byzantine system failures and adversarial
compute nodes, ie, nodes that use malicious updates to corrupt the global model stored at a …

Coded computation over heterogeneous clusters

A Reisizadeh, S Prakash, R Pedarsani… - IEEE Transactions …, 2019 - ieeexplore.ieee.org
In large-scale distributed computing clusters, such as Amazon EC2, there are several types
of “system noise” that can result in major degradation of performance: system failures …

Communication-computation efficient gradient coding

M Ye, E Abbe - International Conference on Machine …, 2018 - proceedings.mlr.press
This paper develops coding techniques to reduce the running time of distributed learning
tasks. It characterizes the fundamental tradeoff to compute gradients in terms of three …

EEG emotion recognition using dynamical graph convolutional neural networks and broad learning system

X Wang, T Zhang, X Xu, L Chen, X Xing… - … on Bioinformatics and …, 2018 - ieeexplore.ieee.org
In recent years, electroencephalogram (EEG) e-motion recognition has been becoming an
emerging field in artificial intelligence area, which can reflect the relation between emotional …