A neural network training processor with 8-bit shared exponent bias floating point and multiple-w...

M Pietrołaj, M Blok - Scientific Reports, 2024 - nature.com

Modern applications of neural-network-based AI solutions tend to move from datacenter
backends to low-power edge devices. Environmental, computational, and power constraints …

被引用次数：2 相关文章所有 8 个版本

A 4.69-TOPS/W Training, 2.34-J/Image Inference On-Chip Training Accelerator With Inference-Compatible Backpropagation and Design Space Exploration in 28 …

J Qian, H Ge, Y Lu, W Shan - IEEE Journal of Solid-State …, 2024 - ieeexplore.ieee.org

On-chip training (OCT) accelerators improve personalized recognition accuracy while
ensuring user privacy. However, previous OCT accelerators often required significant …

被引用次数：1 相关文章

An efficient deep-learning-based super-resolution accelerating SoC with heterogeneous accelerating and hierarchical cache

Z Li, S Kim, D Im, D Han, HJ Yoo - IEEE Journal of Solid-State …, 2022 - ieeexplore.ieee.org

This article presents an energy-efficient accelerating system-on-chip (SoC) for super-
resolution (SR) image reconstruction on a mobile platform. With the rise of contactless …

被引用次数：9 相关文章所有 4 个版本

PL-NPU: An energy-efficient edge-device DNN training processor with posit-based logarithm-domain computing

Y Wang, D Deng, L Liu, S Wei… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org

Edge device deep neural network (DNN) training is practical to improve model adaptivity for
unfamiliar datasets while avoiding privacy disclosure and huge communication cost …

被引用次数：15 相关文章

[PDF] nsf.gov

Processing-in-memory technology for machine learning: From basic to asic

B Taylor, Q Zheng, Z Li, S Li… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org

Due to the need for computing models that can process large quantities of data efficiently
and with high throughput in many state-of-the-art machine learning algorithms, the …

被引用次数：11 相关文章所有 2 个版本

[PDF] arxiv.org

RedMule: A mixed-precision matrix–matrix operation engine for flexible and energy-efficient on-chip linear algebra and TinyML training acceleration

Y Tortorella, L Bertaccini, L Benini, D Rossi… - Future Generation …, 2023 - Elsevier

The increasing interest in TinyML, ie, near-sensor machine learning on power budgets of a
few tens of mW, is currently pushing toward enabling TinyML-class training as opposed to …

被引用次数：11 相关文章所有 5 个版本

[PDF] openreview.net

Toward efficient low-precision training: Data format optimization and hysteresis quantization

S Lee, J Park, D Jeon - International Conference on Learning …, 2022 - openreview.net

As the complexity and size of deep neural networks continue to increase, low-precision
training has been extensively studied in the last few years to reduce hardware overhead …

被引用次数：10 相关文章

ACBN: Approximate calculated batch normalization for efficient DNN on-device training processor

B Li, H Wang, F Luo, X Zhang, H Sun… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org

Batch normalization (BN) has been established as a very effective component in deep
learning, largely helping accelerate the convergence of deep neural network (DNN) training …

被引用次数：4 相关文章所有 3 个版本

Low-overhead inverted LUT design for bounded DNN activation functions on floating-point vector ALUs

SY Kim, CH Kim, WJ Lee, I Park, SW Kim - Microprocessors and …, 2022 - Elsevier

An inference engine uses floating-point numbers to provide high accuracy in deep neural
network computing despite its computing resource limitations. However, the computation for …

被引用次数：5 相关文章所有 4 个版本

[PDF] mdpi.com

All-Digital Computing-in-Memory Macro Supporting FP64-Based Fused Multiply-Add Operation

D Li, K Mo, L Liu, B Pan, W Li, W Kang, L Li - Applied Sciences, 2023 - mdpi.com

Recently, frequent data movement between computing units and memory during floating-
point arithmetic has become a major problem for scientific computing. Computing-in-memory …

被引用次数：4 相关文章所有 2 个版本

高级搜索

QQ 群