Enable deep learning on mobile devices: Methods, systems, and applications

H Cai, J Lin, Y Lin, Z Liu, H Tang, H Wang… - ACM Transactions on …, 2022 - dl.acm.org
Deep neural networks (DNNs) have achieved unprecedented success in the field of artificial
intelligence (AI), including computer vision, natural language processing, and speech …

Dragonn: Distributed randomized approximate gradients of neural networks

Z Wang, Z Xu, X Wu, A Shrivastava… - … on Machine Learning, 2022 - proceedings.mlr.press
Data-parallel distributed training (DDT) has become the de-facto standard for accelerating
the training of most deep learning tasks on massively parallel hardware. In the DDT …

A roadmap for big model

S Yuan, H Zhao, S Zhao, J Leng, Y Liang… - arXiv preprint arXiv …, 2022 - arxiv.org
With the rapid development of deep learning, training Big Models (BMs) for multiple
downstream tasks becomes a popular paradigm. Researchers have achieved various …

[HTML][HTML] Fast fitting of the dynamic memdiode model to the conduction characteristics of RRAM devices using convolutional neural networks

FL Aguirre, E Piros, N Kaiser, T Vogel, S Petzold… - Micromachines, 2022 - mdpi.com
In this paper, the use of Artificial Neural Networks (ANNs) in the form of Convolutional
Neural Networks (AlexNET) for the fast and energy-efficient fitting of the Dynamic Memdiode …

[HTML][HTML] Recognition of sago palm trees based on transfer learning

SMA Letsoin, RC Purwestri, F Rahmawan, D Herak - Remote Sensing, 2022 - mdpi.com
Sago palm tree, known as Metroxylon Sagu Rottb, is one of the priority commodities in
Indonesia. Based on our previous research, the potential habitat of the plant has been …

CP-SGD: Distributed stochastic gradient descent with compression and periodic compensation

E Yu, D Dong, Y Xu, S Ouyang, X Liao - Journal of Parallel and Distributed …, 2022 - Elsevier
Communication overhead is the key challenge for distributed training. Gradient compression
is a widely used approach to reduce communication traffic. When combined with a parallel …

Aperiodic local SGD: Beyond local SGD

H Zhang, T Wu, S Cheng, J Liu - … of the 51st International Conference on …, 2022 - dl.acm.org
Variations of stochastic gradient decedent (SGD) methods are at the core of training deep
neural network models. However, in distributed deep learning, where multiple computing …

Virtualflow: Decoupling deep learning models from the underlying hardware

A Or, H Zhang, MN Freedman - Proceedings of Machine …, 2022 - proceedings.mlsys.org
We propose VirtualFlow, a system leveraging a novel abstraction called virtual node
processing to decouple the model from the hardware. In each step of training or inference …

Embrace: Accelerating sparse communication for distributed training of deep neural networks

S Li, Z Lai, D Li, Y Zhang, X Ye, Y Duan - Proceedings of the 51st …, 2022 - dl.acm.org
Distributed data-parallel training has been widely adopted for deep neural network (DNN)
models. Although current deep learning (DL) frameworks scale well for dense models like …

Bytecomp: Revisiting gradient compression in distributed training

Z Wang, H Lin, Y Zhu, TS Ng - arXiv preprint arXiv:2205.14465, 2022 - arxiv.org
Gradient compression (GC) is a promising approach to addressing the communication
bottleneck in distributed deep learning (DDL). However, it is challenging to find the optimal …