Bitblade: Area and energy-efficient precision-scalable neural network accelerator with bitwise summation

S Ryu, H Kim, W Yi, JJ Kim - Proceedings of the 56th Annual Design …, 2019 - dl.acm.org
Deep Neural Networks (DNNs) have various performance requirements and power
constraints depending on applications. To maximize the energy-efficiency of hardware …

Width attention based convolutional neural network for retinal vessel segmentation

DE Alvarado-Carrillo, OS Dalmau-Cedeño - Expert Systems with …, 2022 - Elsevier
The analysis of the vascular tree is a fundamental part of the clinical assessment of retinal
images. The diversity of blood vessel calibers and curvatures, as well as the ocular vascular …

Hybrid accumulator factored systolic array for machine learning acceleration

K Inayat, J Chung - IEEE Transactions on Very Large Scale …, 2022 - ieeexplore.ieee.org
Deep learning applications have become ubiquitous in today's era and it has led to vast
development in machine learning (ML) accelerators. Systolic arrays have been a primary …

Factored radix-8 systolic array for tensor processing

I Ullah, K Inayat, JS Yang… - 2020 57th ACM/IEEE …, 2020 - ieeexplore.ieee.org
Systolic arrays are re-gaining the attention as the heart to accelerate machine learning
workloads. This paper shows that a large design space exists at the logic level despite the …

Integrated MAC-based systolic arrays: Design and performance evaluation

DN Devi, G Ajay Kumar, BG Gowda… - Proceedings of the Great …, 2024 - dl.acm.org
In the rapidly advancing landscape of computing, hardware accelerator designs are pivotal
for satisfying high performance and low power demands. Systolic array (SA) architectures …

A high-accuracy hardware-efficient multiply–accumulate (mac) unit based on dual-mode truncation error compensation for cnns

SN Tang, YS Han - Ieee Access, 2020 - ieeexplore.ieee.org
This paper presents a multiply-accumulate (MAC) unit that enables a dual-mode truncation
error compensation (TEC) scheme based on a fixed-width Booth multiplier (FWBM) for …

Factored Systolic Arrays Based on Radix-8 Multiplication for Machine Learning Acceleration

K Inayat, I Ullah, J Chung - IEEE Transactions on Very Large …, 2024 - ieeexplore.ieee.org
Systolic arrays (SAs) are re-gaining the attention as the heart to accelerate machine learning
workloads. This article shows that a large design space exists at the logic level despite the …

Booth Encoded Bit-Serial Multiply-Accumulate Units with Improved Area and Energy Efficiencies

X Cheng, Y Wang, J Liu, W Ding, H Lou, P Li - Electronics, 2023 - mdpi.com
Bit-serial multiply-accumulate units (MACs) play a crucial role in various hardware
accelerator applications, including deep learning, image processing, and signal processing …

Dilate-invariant temporal convolutional network for real-time edge applications

EA Ibrahim, B van den Dool, S De… - … on Circuits and …, 2021 - ieeexplore.ieee.org
Temporal Convolutional Networks (TCNs) involving mono channels as input, have shown
superior performance compared to state-of-the-art sequence detection recursive networks in …

Decisive structures for multirate FIR filter incorporating retiming and pipelining schemes

K Mariammal, V Dhandapani - Circuit World, 2020 - emerald.com
Purpose Very large-scale integration (VLSI) digital signal processing became very popular
and is predominantly used in several emerging applications. The optimal design of multirate …