Accelerating deep learning training on high-performance computing with storage tiering

MFL Dantas - 2022 - repositorium.sdum.uminho.pt
Deep Learning (DL) has become fundamental to the advancement of several areas, such as
computer vision, natural language processing and expert systems. Utilizing DL techniques …

Accelerating deep learning training through transparent storage tiering

M Dantas, D Leitão, P Cui, R Macedo… - 2022 22nd IEEE …, 2022 - ieeexplore.ieee.org
We present Monarch, a framework-agnostic storage middleware that transparently employs
storage tiering to accelerate Deep Learning (DL) training. It leverages existing storage tiers …

Monarch: Hierarchical storage management for deep learning frameworks

M Dantas, D Leitao, C Correia… - 2021 IEEE …, 2021 - ieeexplore.ieee.org
Due to convenience and usability, many deep learning (DL) jobs resort to the available
shared parallel file system (PFS) for storing and accessing training data when running in …

The Case for Storage Optimization Decoupling in Deep Learning Frameworks

R Macedo, C Correia, M Dantas, C Brito… - 2021 IEEE …, 2021 - ieeexplore.ieee.org
Deep Learning (DL) training requires efficient access to large collections of data, leading DL
frameworks to implement individual I/O optimizations to take full advantage of storage …

{SHADE}: Enable Fundamental Cacheability for Distributed Deep Learning Training

RIS Khan, AH Yazdani, Y Fu, AK Paul, B Ji… - … USENIX Conference on …, 2023 - usenix.org
Deep learning training (DLT) applications exhibit unique I/O workload behaviors that pose
new challenges for storage system design. DLT is I/O intensive since data samples need to …

[PDF][PDF] Mitigating the Impact of Tail Latency of Storage Systems on Scalable Deep Learning Applications

H Ohtsuji, E Hayashi, N Fukumoto, E Yoshida… - 2019 - pdsw.org
Massive scale deep learning enables HPC systems to finish the training of the large-scale
data sets (eg ImageNet) in several tens of seconds [1]. Therefore, even a small tail latency of …

Data-aware storage tiering for deep learning

C Xu, S Bhattacharya, M Foltin, S Byna… - 2021 IEEE/ACM …, 2021 - ieeexplore.ieee.org
DNN models trained with very large datasets can perform rich deep learning tasks with high
accuracy. However, feeding huge volumes of training data exerts significant pressure on IO …

Large-scale I/O Models for Traditional and Emerging HPC Workloads on Next-Generation HPC Storage Systems

WD Chien - 2022 - diva-portal.org
The ability to create value from large-scale data is now an essential part of research and
driving technological development everywhere from everyday technology to life-saving …

Accelerating deep learning training: a storage perspective

J Mohan - 2021 - repositories.lib.utexas.edu
Abstract Deep Learning, specifically Deep Neural Networks (DNNs), is stressing storage
systems in new ways, moving the training bottleneck to the data pipeline (fetching, pre …

DLIO: A DATA-CENTRIC BENCHMARK FOR DEEP LEARNING APPLICATIONS

H ZHENG, V VISHWANATH - 2023 - osti.gov
SF-22-136 Deep learning has been shown as a successful method for various tasks, and its
popularity results in numerous open-source deep learning software tools. Deep learning has …