{FlashNeuron}:{SSD-Enabled}{Large-Batch} Training of Very Deep Neural Networks

J Bae, J Lee, Y Jin, S Son, S Kim, H Jang… - … USENIX Conference on …, 2021 - usenix.org
Deep neural networks (DNNs) are widely used in various AI application domains such as
computer vision, natural language processing, autonomous driving, and bioinformatics. As …

Behemoth: a flash-centric training accelerator for extreme-scale {DNNs}

S Kim, Y Jin, G Sohn, J Bae, TJ Ham… - 19th USENIX Conference …, 2021 - usenix.org
The explosive expansion of Deep Neural Networks (DNN) model size expedites the need for
larger memory capacity. This movement is particularly true for models in natural language …

Stannis: low-power acceleration of dnn training using computational storage devices

A HeydariGorji, M Torabzadehkashi… - 2020 57th ACM/IEEE …, 2020 - ieeexplore.ieee.org
Computational storage devices enable in-storage processing of data in place. These
devices contain 64-bit application processors and hardware accelerators that can help …

Stronghold: fast and affordable billion-scale deep learning model training

X Sun, W Wang, S Qiu, R Yang… - … Conference for High …, 2022 - ieeexplore.ieee.org
Deep neural networks (DNNs) with billion-scale parameters have demonstrated impressive
performance in solving many tasks. Unfortunately, training a billion-scale DNN is out of the …

Mini-batch serialization: Cnn training with inter-layer data reuse

S Lym, A Behroozi, W Wen, G Li… - … of Machine Learning …, 2019 - proceedings.mlsys.org
Training convolutional neural networks (CNNs) requires intense computations and high
memory bandwidth. We find that bandwidth today is over-provisioned because most memory …

Harmony: Overcoming the hurdles of gpu memory capacity to train massive dnn models on commodity servers

Y Li, A Phanishayee, D Murray, J Tarnawski… - arXiv preprint arXiv …, 2022 - arxiv.org
Deep neural networks (DNNs) have grown exponentially in size over the past decade,
leaving only those who have massive datacenter-based resources with the ability to develop …

GradPIM: A practical processing-in-DRAM architecture for gradient descent

H Kim, H Park, T Kim, K Cho, E Lee… - … Symposium on High …, 2021 - ieeexplore.ieee.org
In this paper, we present GradPIM, a processingin-memory architecture which accelerates
parameter updates of deep neural networks training. As one of processing-in-memory …

Dynamic memory management for GPU-based training of deep neural networks

SB Shriram, A Garg, P Kulkarni - 2019 IEEE International …, 2019 - ieeexplore.ieee.org
Deep learning has been widely adopted for different applications of artificial intelligence-
speech recognition, natural language processing, computer vision etc. The growing size of …

Fusing in-storage and near-storage acceleration of convolutional neural networks

I Okafor, AK Ramanathan, NR Challapalle, Z Li… - ACM Journal on …, 2023 - dl.acm.org
Video analytics has a wide range of applications and has attracted much interest over the
years. While it can be both computationally and energy-intensive, video analytics can greatly …

Enabling Large Dynamic Neural Network Training with Learning-based Memory Management

J Ren, D Xu, S Yang, J Zhao, Z Li… - … Symposium on High …, 2024 - ieeexplore.ieee.org
Dynamic neural network (DyNN) enables high computational efficiency and strong
representation capability. However, training DyNN can face a memory capacity problem …