SP-PIM: A 22.41 TFLOPS/W, 8.81 Epochs/Sec Super-Pipelined Processing-In-Memory Accelerator with Local Error Prediction for On-Device Learning

JH Kim, J Heo, W Han, J Kim… - 2023 IEEE Symposium on …, 2023 - ieeexplore.ieee.org
JH Kim, J Heo, W Han, J Kim, JY Kim
2023 IEEE Symposium on VLSI Technology and Circuits (VLSI …, 2023ieeexplore.ieee.org
This paper presents SP-PIM that demonstrates real-time on-device learning based on the
holistic, multi-level pipelining scheme enabled by local error prediction. It introduces the
local error prediction unit to make the training algorithm pipelineable, while reducing
computation overhead and overall external memory access based on power-of-two
arithmetic operations and random weights. Its double-buffered PIM macro is designed for
performing both forward propagation and gradient calculation, while the dual-sparsity-aware …
This paper presents SP-PIM that demonstrates real-time on-device learning based on the holistic, multi-level pipelining scheme enabled by local error prediction. It introduces the local error prediction unit to make the training algorithm pipelineable, while reducing computation overhead and overall external memory access based on power-of-two arithmetic operations and random weights. Its double-buffered PIM macro is designed for performing both forward propagation and gradient calculation, while the dual-sparsity-aware circuits exploit sparsity in activation and error. Finally, the 5.76mm 2 SP-PIM chip fabricated in 28nm process achieves 8.81Epochs/Sec model training on chip with the state-of-the-art 560.6GFLOPS/mm 2 area efficiency and 22.4TFLOPS/W power efficiency.
ieeexplore.ieee.org
以上显示的是最相近的搜索结果。 查看全部搜索结果