Model compression and hardware acceleration for neural networks: A comprehensive survey

L Deng, G Li, S Han, L Shi, Y Xie - Proceedings of the IEEE, 2020 - ieeexplore.ieee.org
Domain-specific hardware is becoming a promising topic in the backdrop of improvement
slow down for general-purpose processors due to the foreseeable end of Moore's Law …

Communication-efficient edge AI: Algorithms and systems

Y Shi, K Yang, T Jiang, J Zhang… - … Surveys & Tutorials, 2020 - ieeexplore.ieee.org
Artificial intelligence (AI) has achieved remarkable breakthroughs in a wide range of fields,
ranging from speech processing, image classification to drug discovery. This is driven by the …

PowerSGD: Practical low-rank gradient compression for distributed optimization

T Vogels, SP Karimireddy… - Advances in Neural …, 2019 - proceedings.neurips.cc
We study gradient compression methods to alleviate the communication bottleneck in data-
parallel distributed optimization. Despite the significant attention received, current …

Grace: A compressed communication framework for distributed machine learning

H Xu, CY Ho, AM Abdelmoniem, A Dutta… - 2021 IEEE 41st …, 2021 - ieeexplore.ieee.org
Powerful computer clusters are used nowadays to train complex deep neural networks
(DNN) on large datasets. Distributed training increasingly becomes communication bound …

PipeGCN: Efficient full-graph training of graph convolutional networks with pipelined feature communication

C Wan, Y Li, CR Wolfe, A Kyrillidis, NS Kim… - arXiv preprint arXiv …, 2022 - arxiv.org
Graph Convolutional Networks (GCNs) is the state-of-the-art method for learning graph-
structured data, and training large-scale GCNs requires distributed training across multiple …

A survey on large-scale machine learning

M Wang, W Fu, X He, S Hao… - IEEE Transactions on …, 2020 - ieeexplore.ieee.org
Machine learning can provide deep insights into data, allowing machines to make high-
quality predictions and having been widely used in real-world applications, such as text …

Accelerating distributed reinforcement learning with in-switch computing

Y Li, IJ Liu, Y Yuan, D Chen, A Schwing… - Proceedings of the 46th …, 2019 - dl.acm.org
Reinforcement learning (RL) has attracted much attention recently, as new and emerging AI-
based applications are demanding the capabilities to intelligently react to environment …

Scalecom: Scalable sparsified gradient compression for communication-efficient distributed training

CY Chen, J Ni, S Lu, X Cui, PY Chen… - Advances in …, 2020 - proceedings.neurips.cc
Large-scale distributed training of Deep Neural Networks (DNNs) on state-of-the-art
platforms are expected to be severely communication constrained. To overcome this …

Compressed communication for distributed deep learning: Survey and quantitative evaluation

H Xu, CY Ho, AM Abdelmoniem, A Dutta, EH Bergou… - 2020 - repository.kaust.edu.sa
Powerful computer clusters are used nowadays to train complex deep neural networks
(DNN) on large datasets. Distributed training workloads increasingly become …

{Check-N-Run}: A checkpointing system for training deep learning recommendation models

A Eisenman, KK Matam, S Ingram, D Mudigere… - … USENIX Symposium on …, 2022 - usenix.org
Checkpoints play an important role in training long running machine learning (ML) models.
Checkpoints take a snapshot of an ML model and store it in a non-volatile memory so that …