Resource constrained neural network training

M Pietrołaj, M Blok - Scientific Reports, 2024 - nature.com
Modern applications of neural-network-based AI solutions tend to move from datacenter
backends to low-power edge devices. Environmental, computational, and power constraints …

A 4.69-TOPS/W Training, 2.34-J/Image Inference On-Chip Training Accelerator With Inference-Compatible Backpropagation and Design Space Exploration in 28 …

J Qian, H Ge, Y Lu, W Shan - IEEE Journal of Solid-State …, 2024 - ieeexplore.ieee.org
On-chip training (OCT) accelerators improve personalized recognition accuracy while
ensuring user privacy. However, previous OCT accelerators often required significant …

An efficient deep-learning-based super-resolution accelerating SoC with heterogeneous accelerating and hierarchical cache

Z Li, S Kim, D Im, D Han, HJ Yoo - IEEE Journal of Solid-State …, 2022 - ieeexplore.ieee.org
This article presents an energy-efficient accelerating system-on-chip (SoC) for super-
resolution (SR) image reconstruction on a mobile platform. With the rise of contactless …

PL-NPU: An energy-efficient edge-device DNN training processor with posit-based logarithm-domain computing

Y Wang, D Deng, L Liu, S Wei… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org
Edge device deep neural network (DNN) training is practical to improve model adaptivity for
unfamiliar datasets while avoiding privacy disclosure and huge communication cost …

Processing-in-memory technology for machine learning: From basic to asic

B Taylor, Q Zheng, Z Li, S Li… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org
Due to the need for computing models that can process large quantities of data efficiently
and with high throughput in many state-of-the-art machine learning algorithms, the …

RedMule: A mixed-precision matrix–matrix operation engine for flexible and energy-efficient on-chip linear algebra and TinyML training acceleration

Y Tortorella, L Bertaccini, L Benini, D Rossi… - Future Generation …, 2023 - Elsevier
The increasing interest in TinyML, ie, near-sensor machine learning on power budgets of a
few tens of mW, is currently pushing toward enabling TinyML-class training as opposed to …

Toward efficient low-precision training: Data format optimization and hysteresis quantization

S Lee, J Park, D Jeon - International Conference on Learning …, 2022 - openreview.net
As the complexity and size of deep neural networks continue to increase, low-precision
training has been extensively studied in the last few years to reduce hardware overhead …

ACBN: Approximate calculated batch normalization for efficient DNN on-device training processor

B Li, H Wang, F Luo, X Zhang, H Sun… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
Batch normalization (BN) has been established as a very effective component in deep
learning, largely helping accelerate the convergence of deep neural network (DNN) training …

Low-overhead inverted LUT design for bounded DNN activation functions on floating-point vector ALUs

SY Kim, CH Kim, WJ Lee, I Park, SW Kim - Microprocessors and …, 2022 - Elsevier
An inference engine uses floating-point numbers to provide high accuracy in deep neural
network computing despite its computing resource limitations. However, the computation for …

All-Digital Computing-in-Memory Macro Supporting FP64-Based Fused Multiply-Add Operation

D Li, K Mo, L Liu, B Pan, W Li, W Kang, L Li - Applied Sciences, 2023 - mdpi.com
Recently, frequent data movement between computing units and memory during floating-
point arithmetic has become a major problem for scientific computing. Computing-in-memory …