Deep learning for misinformation detection on online social networks: a survey and new perspectives

MR Islam, S Liu, X Wang, G Xu - Social Network Analysis and Mining, 2020 - Springer
Recently, the use of social networks such as Facebook, Twitter, and Sina Weibo has
become an inseparable part of our daily lives. It is considered as a convenient platform for …

Distributed machine learning for wireless communication networks: Techniques, architectures, and applications

S Hu, X Chen, W Ni, E Hossain… - … Surveys & Tutorials, 2021 - ieeexplore.ieee.org
Distributed machine learning (DML) techniques, such as federated learning, partitioned
learning, and distributed reinforcement learning, have been increasingly applied to wireless …

Highly scalable deep learning training system with mixed-precision: Training imagenet in four minutes

X Jia, S Song, W He, Y Wang, H Rong, F Zhou… - arXiv preprint arXiv …, 2018 - arxiv.org
Synchronized stochastic gradient descent (SGD) optimizers with data parallelism are widely
used in training large-scale deep neural networks. Although using larger mini-batch sizes …

The next generation of deep learning hardware: Analog computing

W Haensch, T Gokmen, R Puri - Proceedings of the IEEE, 2018 - ieeexplore.ieee.org
Initially developed for gaming and 3-D rendering, graphics processing units (GPUs) were
recognized to be a good fit to accelerate deep learning training. Its simple mathematical …

Priority-based parameter propagation for distributed DNN training

A Jayarajan, J Wei, G Gibson… - Proceedings of …, 2019 - proceedings.mlsys.org
Data parallel training is widely used for scaling distributed deep neural network (DNN)
training. However, the performance benefits are often limited by the communication-heavy …

Communication-efficient distributed deep learning: A comprehensive survey

Z Tang, S Shi, W Wang, B Li, X Chu - arXiv preprint arXiv:2003.06307, 2020 - arxiv.org
Distributed deep learning (DL) has become prevalent in recent years to reduce training time
by leveraging multiple computing devices (eg, GPUs/TPUs) due to larger models and …

A distributed synchronous SGD algorithm with global top-k sparsification for low bandwidth networks

S Shi, Q Wang, K Zhao, Z Tang, Y Wang… - 2019 IEEE 39th …, 2019 - ieeexplore.ieee.org
Distributed synchronous stochastic gradient descent (S-SGD) with data parallelism has
been widely used in training large-scale deep neural networks (DNNs), but it typically …

Distributed deep learning in open collaborations

M Diskin, A Bukhtiyarov, M Ryabinin… - Advances in …, 2021 - proceedings.neurips.cc
Modern deep learning applications require increasingly more compute to train state-of-the-
art models. To address this demand, large corporations and institutions use dedicated High …

Carbonscaler: Leveraging cloud workload elasticity for optimizing carbon-efficiency

WA Hanafy, Q Liang, N Bashir, D Irwin… - Proceedings of the ACM …, 2023 - dl.acm.org
Cloud platforms are increasing their emphasis on sustainability and reducing their
operational carbon footprint. A common approach for reducing carbon emissions is to exploit …

[PDF][PDF] A convergence analysis of distributed SGD with communication-efficient gradient sparsification.

S Shi, K Zhao, Q Wang, Z Tang, X Chu - IJCAI, 2019 - ijcai.org
Gradient sparsification is a promising technique to significantly reduce the communication
overhead in decentralized synchronous stochastic gradient descent (S-SGD) algorithms …