Neupims: Npu-pim heterogeneous acceleration for batched llm inferencing

G Heo, S Lee, J Cho, H Choi, S Lee, H Ham… - Proceedings of the 29th …, 2024 - dl.acm.org
Modern transformer-based Large Language Models (LLMs) are constructed with a series of
decoder blocks. Each block comprises three key components:(1) QKV generation,(2) multi …

pSyncPIM: Partially Synchronous Execution of Sparse Matrix Operations for All-Bank PIM Architectures

D Baek, S Hwang, J Huh - 2024 ACM/IEEE 51st Annual …, 2024 - ieeexplore.ieee.org
Recent commercial incarnations of processing-in-memory (PIM) maintain the standard
DRAM interface and employ the all-bank mode execution to maximize bank-level memory …

Darwin: A DRAM-Based Multi-Level Processing-in-Memory Architecture for Column-Oriented Database

D Kim, JY Kim, W Han, J Won, H Choi… - … on Emerging Topics …, 2024 - ieeexplore.ieee.org
We propose Darwin, a practical LRDIMM-based multi-level Processing-in-memory (PIM)
architecture for data analytics, which exploits the internal bandwidth of DRAM using the …

LoL-PIM: Long-Context LLM Decoding with Scalable DRAM-PIM System

H Kwon, K Koo, J Kim, W Lee, M Lee, H Lee… - arXiv preprint arXiv …, 2024 - arxiv.org
The expansion of large language models (LLMs) with hundreds of billions of parameters
presents significant challenges to computational resources, particularly data movement and …