Optimizing the Training of Co-Located Deep Learning Models Using Cache-Aware Staggering

K Assogba, B Nicolae… - 2023 IEEE 30th …, 2023 - ieeexplore.ieee.org
Despite significant advances, training deep learning models remains a time-consuming and
resource-intensive task. One of the key challenges in this context is the ingestion of the …

TensorSocket: Shared Data Loading for Deep Learning Training

T Robroek, NK Nielsen, P Tözün - arXiv preprint arXiv:2409.18749, 2024 - arxiv.org
Training deep learning models is a repetitive and resource-intensive process. Data
scientists often train several models before landing on set of parameters (eg, hyper …

EdgeServe: Efficient deep learning model caching at the edge

T Guo, RJ Walls, SS Ogden - Proceedings of the 4th ACM/IEEE …, 2019 - dl.acm.org
In this work, we look at how to effectively manage and utilize deep learning models at each
edge location, to provide performance guarantees to inference requests. We identify …

Accelerating deep learning inference via learned caches

A Balasubramanian, A Kumar, Y Liu, H Cao… - arXiv preprint arXiv …, 2021 - arxiv.org
Deep Neural Networks (DNNs) are witnessing increased adoption in multiple domains
owing to their high accuracy in solving real-world problems. However, this high accuracy …

[PDF][PDF] Saturn: An Optimized Data System for Multi-Large-Model Deep Learning Workloads (Information System Architectures)

K Nagrecha, A Kumar - 2023 - adalabucsd.github.io
Large models such as GPT-3 and ChatGPT have transformed deep learning (DL), powering
applications that have captured the public's imagination. Such models must be trained on …

Hoard: A distributed data caching system to accelerate deep learning training on the cloud

C Pinto, Y Gkoufas, A Reale, S Seelam… - arXiv preprint arXiv …, 2018 - arxiv.org
Deep Learning system architects strive to design a balanced system where the
computational accelerator--FPGA, GPU, etc, is not starved for data. Feeding training data …

Intermediate data caching optimization for multi-stage and parallel big data frameworks

Z Yang, D Jia, S Ioannidis, N Mi… - 2018 IEEE 11th …, 2018 - ieeexplore.ieee.org
In the era of big data and cloud computing, large amounts of data are generated from user
applications and need to be processed in the datacenter. Data-parallel computing …

Partitioned Neural Network Training via Synthetic Intermediate Labels

CV Karadağ, N Topaloğlu - arXiv preprint arXiv:2403.11204, 2024 - arxiv.org
The proliferation of extensive neural network architectures, particularly deep learning
models, presents a challenge in terms of resource-intensive training. GPU memory …

Enabling efficient large-scale deep learning training with cache coherent disaggregated memory systems

Z Wang, J Sim, E Lim, J Zhao - 2022 IEEE International …, 2022 - ieeexplore.ieee.org
Modern deep learning (DL) training is memory-consuming, constrained by the memory
capacity of each computation component and cross-device communication bandwidth. In …

COS: Cross-Processor Operator Scheduling for Multi-Tenant Deep Learning Inference

C Lin, J Liu - 2024 IEEE/ACM 32nd International Symposium on …, 2024 - ieeexplore.ieee.org
Multi-tenant inference, as a prevalent inference paradigm nowadays, requires deploying
multiple deep learning models on the hardware platform to concurrently process inference …