Enabling Large Dynamic Neural Network Training with Learning-based Memory Management

文章

学术资源搜索

获得 4 条结果（用时0.02秒）

我的图书馆

Enabling Large Dynamic Neural Network Training with Learning-based Memory Management

在引用文章中搜索

[PDF] arxiv.org

A Survey on Inference Optimization Techniques for Mixture of Experts Models

J Liu, P Tang, W Wang, Y Ren, X Hou, PA Heng… - arXiv preprint arXiv …, 2024 - arxiv.org

The emergence of large-scale Mixture of Experts (MoE) models has marked a significant
advancement in artificial intelligence, offering enhanced model capacity and computational …

MTM: Rethinking Memory Profiling and Migration for Multi-Tiered Large Memory

J Ren, D Xu, J Ryu, K Shin, D Kim, D Li - Proceedings of the Nineteenth …, 2024 - dl.acm.org

Multi-terabyte large memory systems are often characterized by more than two memory tiers
with different latency and bandwidth. Multi-tiered large memory systems call for rethinking of …

被引用次数：8 相关文章所有 4 个版本

FACET: On-the-Fly Activation Compression for Efficient Transformer Training

S Lee, G Yun, XT Nguyen… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org

Training Transformer models, known for their outstanding performance in various tasks, can
be challenging due to extensive training times and substantial memory requirements. One …

[PDF] pasalabs.org

[PDF][PDF] LM-Offload: Performance Model-Guided Generative Inference of Large Language Models with Parallelism Control

J Wu, J Ren, S Yang, K Parasyris, G Georgakoudis… - pasalabs.org

Large language models (LLMs) have achieved remarkable success in various natural
language processing tasks. However, LLM inference is highly computational and memory …

被引用次数：2 相关文章

高级搜索

QQ 群

Enabling Large Dynamic Neural Network Training with Learning-based Memory Management

A Survey on Inference Optimization Techniques for Mixture of Experts Models

MTM: Rethinking Memory Profiling and Migration for Multi-Tiered Large Memory

FACET: On-the-Fly Activation Compression for Efficient Transformer Training

[PDF][PDF] LM-Offload: Performance Model-Guided Generative Inference of Large Language Models with Parallelism Control

引用