Neupims: Npu-pim heterogeneous acceleration for batched llm inferencing

G Heo, S Lee, J Cho, H Choi, S Lee, H Ham… - Proceedings of the 29th …, 2024 - dl.acm.org
Modern transformer-based Large Language Models (LLMs) are constructed with a series of
decoder blocks. Each block comprises three key components:(1) QKV generation,(2) multi …

[HTML][HTML] PIMCoSim: Hardware/Software Co-Simulator for Exploring Processing-in-Memory Architectures

J Shin, S An, S Lee, SE Lee - Electronics, 2024 - mdpi.com
As the scope of artificial intelligence (AI) expands and the structure becomes more complex,
the amount of data for inference and training has increased. In traditional computer …