Fast: Dnn training under variable precision block floating point with stochastic rounding

SQ Zhang, B McDanel, HT Kung - 2022 IEEE International …, 2022 - ieeexplore.ieee.org
Block Floating Point (BFP) can efficiently support quantization for Deep Neural Network
(DNN) training by providing a wide dynamic range via a shared exponent across a group of …

HNPU-V1: An Adaptive DNN Training Processor Utilizing Stochastic Dynamic Fixed-Point and Active Bit-Precision Searching

D Han, HJ Yoo - On-Chip Training NPU-Algorithm, Architecture and …, 2023 - Springer
This chapter presents HNPU, which is an energy-efficient DNN training processor by
adopting algorithm–hardware co-design. The HNPU supports stochastic dynamic fixed-point …

T-PIM: An energy-efficient processing-in-memory accelerator for end-to-end on-device training

J Heo, J Kim, S Lim, W Han… - IEEE Journal of Solid-State …, 2022 - ieeexplore.ieee.org
Recently, on-device training has become crucial for the success of edge intelligence.
However, frequent data movement between computing units and memory during training …

An energy-efficient transformer processor exploiting dynamic weak relevances in global attention

Y Wang, Y Qin, D Deng, J Wei, Y Zhou… - IEEE Journal of Solid …, 2022 - ieeexplore.ieee.org
Transformer-based models achieve tremendous success in many artificial intelligence (AI)
tasks, outperforming conventional convolution neural networks (CNNs) from natural …

ETA: An efficient training accelerator for DNNs based on hardware-algorithm co-optimization

J Lu, C Ni, Z Wang - IEEE Transactions on Neural Networks and …, 2022 - ieeexplore.ieee.org
Recently, the efficient training of deep neural networks (DNNs) on resource-constrained
platforms has attracted increasing attention for protecting user privacy. However, it is still a …

CAMEL: Co-Designing AI Models and eDRAMs for Efficient On-Device Learning

SQ Zhang, T Tambe, N Cuevas… - … Symposium on High …, 2024 - ieeexplore.ieee.org
On-device learning allows AI models to adapt to user data, thereby enhancing service
quality on edge platforms. However, training AI on resource-limited devices poses significant …

An Overview of Energy-Efficient DNN Training Processors

D Han, HJ Yoo - On-Chip Training NPU-Algorithm, Architecture and …, 2023 - Springer
Many edge/mobile devices are now able to utilize deep neural networks (DNNs) thanks to
the development of mobile DNN accelerators. Mobile DNN accelerators overcame the …

A 4.69-TOPS/W Training, 2.34-J/Image Inference On-Chip Training Accelerator With Inference-Compatible Backpropagation and Design Space Exploration in 28 …

J Qian, H Ge, Y Lu, W Shan - IEEE Journal of Solid-State …, 2024 - ieeexplore.ieee.org
On-chip training (OCT) accelerators improve personalized recognition accuracy while
ensuring user privacy. However, previous OCT accelerators often required significant …

THETA: A high-efficiency training accelerator for DNNs with triple-side sparsity exploration

J Lu, J Huang, Z Wang - … on Very Large Scale Integration (VLSI …, 2022 - ieeexplore.ieee.org
Training deep neural networks (DNNs) on edge devices has attracted increasing attention in
real-world applications for domain adaption and privacy protection. However, deploying …

PL-NPU: An energy-efficient edge-device DNN training processor with posit-based logarithm-domain computing

Y Wang, D Deng, L Liu, S Wei… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org
Edge device deep neural network (DNN) training is practical to improve model adaptivity for
unfamiliar datasets while avoiding privacy disclosure and huge communication cost …