Distributed deep learning (DL) has become prevalent in recent years to reduce training time by leveraging multiple computing devices (eg, GPUs/TPUs) due to larger models and …
We consider the fully decentralized multi-agent reinforcement learning (MARL) problem, where the agents are connected via a time-varying and possibly sparse communication …
Most distributed machine learning systems nowadays, including TensorFlow and CNTK, are built in a centralized fashion. One bottleneck of centralized algorithms lies on high …
In decentralized optimization, nodes cooperate to minimize an overall objective function that is the sum (or average) of per-node private objective functions. Algorithms interleave local …
Most commonly used distributed machine learning systems are either synchronous or centralized asynchronous. Synchronous algorithms like AllReduce-SGD perform poorly in a …
This paper considers the problem of distributed optimization over time-varying graphs. For the case of undirected graphs, we introduce a distributed algorithm, referred to as DIGing …
G Qu, N Li - IEEE Transactions on Control of Network Systems, 2017 - ieeexplore.ieee.org
There has been a growing effort in studying the distributed optimization problem over a network. The objective is to optimize a global function formed by a sum of local functions …
While training a machine learning model using multiple workers, each of which collects data from its own data source, it would be useful when the data collected from different workers …
We analyze Local SGD (aka parallel or federated SGD) and Minibatch SGD in the heterogeneous distributed setting, where each machine has access to stochastic gradient …