Efficient memory management for large language model serving with pagedattention

W Kwon, Z Li, S Zhuang, Y Sheng, L Zheng… - Proceedings of the 29th …, 2023 - dl.acm.org
… In summary, GPU workers do not need to synchronize on memory management as they only
need to receive all the memory management information at the beginning of each decoding …

Efficient memory management for deep neural net inference

Y Pisarchyk, J Lee - arXiv preprint arXiv:2001.03288, 2020 - arxiv.org
… However, the authors do not focus on the core problem of memory management and do not
explore different algorithms that can solve this problem in the most effective way. (Chen et al.…

The Impact of Cache and Dynamic Memory Management in Static Dataflow Applications

A Ghasemi, M Ruaro, R Cataldo, JP Diguet… - Journal of Signal …, 2022 - Springer
… Additionally, this work investigates the adoption of two memory management strategies
for dataflow applications: Copy-on-Write (CoW) and Non-Temporal Memory transfers (NTM). …

Efficient memory management for gpu-based deep learning systems

J Zhang, SH Yeung, Y Shu, B He, W Wang - arXiv preprint arXiv …, 2019 - arxiv.org
… In this section, we introduce the unified program abstraction for implementing SmartPool
and AutoSwap, which could be adopted into an existing deep learning framework with ease. …

[PDF][PDF] Characterization of Memory Access in Deep Learning and Its Implications in Memory Management.

J Lee, H Bahn - Computers, Materials & Continua, 2023 - cdn.techscience.cn
Abstract: Due to the recent trend of software intelligence in the Fourth … memory capacity
efficiently for deep learning workloads becomes important. In this paper, we analyze memory

On cache limits for dataflow applications and related efficient memory management strategies

A Ghasemi, R Cataldo, JP Diguet… - Workshop on Design and …, 2021 - dl.acm.org
… This paper presents two efficient memory management … , to address the memory aspects
of the dataflow model: copy-on… Sniper: exploring the level of abstraction for scalable and …

Capuchin: Tensor-based gpu memory management for deep learning

X Peng, X Shi, H Dai, H Jin, W Ma, Q Xiong… - Proceedings of the …, 2020 - dl.acm.org
… GPU memory management based on computation graph and characteristics of diferent
layers. However, a layer of neural networks is a high-level computation abstraction composed of …

Smart memory management (SaMM) for embedded systems without MMU

K Bukkapatnam, CK Rekha… - IOP Conference …, 2020 - iopscience.iop.org
Abstract. In the wake of extensible … Memory Management solution has to be very fast as
well as stable. This paper proposes a Memory Management Scheme, which minimises memory

Nimble page management for tiered memory systems

Z Yan, D Lustig, D Nellans… - Proceedings of the Twenty …, 2019 - dl.acm.org
memory bandwidth. To remedy these shortcomings, we propose and implement a general
purpose OS-integrated multi-level memory management … an abstract example of the memory

[PDF][PDF] Improvements of Memory Management in KLEE

J Novák - 2020 - is.muni.cz
… created when no external call is executed, overall memory management was improved,
supporting programs which require more memory to be allocated. This thesis focused on solving …