Delta: Dynamically optimizing gpu memory beyond tensor recomputation

文章

学术资源搜索

获得 4 条结果（用时0.08秒）

我的图书馆

Delta: Dynamically optimizing gpu memory beyond tensor recomputation

在引用文章中搜索

[PDF] nsf.gov

Mpress: Democratizing billion-scale model training on multi-gpu servers via memory-saving inter-operator parallelism

Q Zhou, H Wang, X Yu, C Li, Y Bai… - … Symposium on High …, 2023 - ieeexplore.ieee.org

It remains challenging to train billion-scale DNN models on a single modern multi-GPU
server due to the GPU memory wall. Unfortunately, existing memory-saving techniques such …

被引用次数：14 相关文章所有 3 个版本

[PDF] acm.org Full View

XEngine: Optimal tensor rematerialization for neural networks in heterogeneous environments

M Schuler, R Membarth, P Slusallek - ACM Transactions on Architecture …, 2022 - dl.acm.org

Memory efficiency is crucial in training deep learning networks on resource-restricted
devices. During backpropagation, forward tensors are used to calculate gradients. Despite …

被引用次数：4 相关文章所有 8 个版本

[PDF] ssslab.cn

Exploiting Input Tensor Dynamics in Activation Checkpointing for Efficient Training on GPU

J Liao, M Li, H Yang, Q Sun, B Sun… - 2023 IEEE …, 2023 - ieeexplore.ieee.org

Larger deep learning models usually lead to higher model quality, however with an ever-
increasing GPU memory footprint. Although several tensor checkpointing techniques have …

[图书][B] Compiler and Runtime Techniques for Optimizing Deep Learning Applications

SS Lyubomirsky - 2022 - search.proquest.com

As the scaling and performance demands for deep learning systems have grown, system
designers have struggled to incorporate innovations at opposite ends of the system stack …

被引用次数：2 相关文章所有 3 个版本

高级搜索

QQ 群

Delta: Dynamically optimizing gpu memory beyond tensor recomputation

Mpress: Democratizing billion-scale model training on multi-gpu servers via memory-saving inter-operator parallelism

XEngine: Optimal tensor rematerialization for neural networks in heterogeneous environments

Exploiting Input Tensor Dynamics in Activation Checkpointing for Efficient Training on GPU

[图书][B] Compiler and Runtime Techniques for Optimizing Deep Learning Applications

引用