{FlashNeuron}:{SSD-Enabled}{Large-Batch} training of very deep neural networks

J Bae, J Lee, Y Jin, S Son, S Kim, H Jang… - … USENIX Conference on …, 2021 - usenix.org
Deep neural networks (DNNs) are widely used in various AI application domains such as
computer vision, natural language processing, autonomous driving, and bioinformatics. As …

Behemoth: a flash-centric training accelerator for extreme-scale {DNNs}

S Kim, Y Jin, G Sohn, J Bae, TJ Ham… - 19th USENIX Conference …, 2021 - usenix.org
The explosive expansion of Deep Neural Networks (DNN) model size expedites the need for
larger memory capacity. This movement is particularly true for models in natural language …

Stannis: low-power acceleration of dnn training using computational storage devices

A HeydariGorji, M Torabzadehkashi… - 2020 57th ACM/IEEE …, 2020 - ieeexplore.ieee.org
Computational storage devices enable in-storage processing of data in place. These
devices contain 64-bit application processors and hardware accelerators that can help …

Optimstore: In-storage optimization of large scale dnns with on-die processing

J Kim, M Kang, Y Han, YG Kim… - 2023 IEEE International …, 2023 - ieeexplore.ieee.org
Training deep neural network (DNN) models is a resource-intensive, iterative process. For
this reason, nowadays, complex optimizers like Adam are widely adopted as it increases the …

Stronghold: fast and affordable billion-scale deep learning model training

X Sun, W Wang, S Qiu, R Yang… - … Conference for High …, 2022 - ieeexplore.ieee.org
Deep neural networks (DNNs) with billion-scale parameters have demonstrated impressive
performance in solving many tasks. Unfortunately, training a billion-scale DNN is out of the …

Mini-batch serialization: Cnn training with inter-layer data reuse

S Lym, A Behroozi, W Wen, G Li… - … of Machine Learning …, 2019 - proceedings.mlsys.org
Training convolutional neural networks (CNNs) requires intense computations and high
memory bandwidth. We find that bandwidth today is over-provisioned because most memory …

Harmony: Overcoming the hurdles of gpu memory capacity to train massive dnn models on commodity servers

Y Li, A Phanishayee, D Murray, J Tarnawski… - arXiv preprint arXiv …, 2022 - arxiv.org
Deep neural networks (DNNs) have grown exponentially in size over the past decade,
leaving only those who have massive datacenter-based resources with the ability to develop …

Dynamic memory management for GPU-based training of deep neural networks

SB Shriram, A Garg, P Kulkarni - 2019 IEEE International …, 2019 - ieeexplore.ieee.org
Deep learning has been widely adopted for different applications of artificial intelligence-
speech recognition, natural language processing, computer vision etc. The growing size of …

GradPIM: A practical processing-in-DRAM architecture for gradient descent

H Kim, H Park, T Kim, K Cho, E Lee… - … Symposium on High …, 2021 - ieeexplore.ieee.org
In this paper, we present GradPIM, a processingin-memory architecture which accelerates
parameter updates of deep neural networks training. As one of processing-in-memory …

Enabling Large Dynamic Neural Network Training with Learning-based Memory Management

J Ren, D Xu, S Yang, J Zhao, Z Li… - … Symposium on High …, 2024 - ieeexplore.ieee.org
Dynamic neural network (DyNN) enables high computational efficiency and strong
representation capability. However, training DyNN can face a memory capacity problem …