Efficient softmax hardware architecture for deep neural networks

G Du, C Tian, Z Li, D Zhang, Y Yin… - … of the 2019 on Great Lakes …, 2019 - dl.acm.org
Deep neural network (DNN) has become a pivotal machine learning and object recognition
technology in the big data era. The softmax layer is one of the key component layers for …

An efficient and fast softmax hardware architecture (EFSHA) for deep neural networks

MA Hussain, TH Tsai - 2021 IEEE 3rd International conference …, 2021 - ieeexplore.ieee.org
Deep neural networks are widely used in computer vision applications due to their high
performance. However, DNNs involve a large number of computations in the training and …

[PDF][PDF] Hardware acceleration of mutual information-based 3D image registration

CR Castro-Pareja, R Shekhar - Journal of Imaging Science and …, 2005 - library.imaging.org
© 2005, IS&T—The Society for Imaging Science and Technology the anatomy. Both linear
and elastic registration are useful tools in many medical applications. 2− 7 While a linear …

FPGA implementation of variable-precision floating-point arithmetic

Y Lei, Y Dou, S Guo, J Zhou - … APPT 2011, Shanghai, China, September 26 …, 2011 - Springer
This paper explores the capability of FPGA solutions to accelerate scientific applications with
variable-precision floating-point (VP) arithmetic. First, we present a special-purpose Very …

VPFPAP: A special-purpose VLIW processor for variable-precision floating-point arithmetic

Y Lei, Y Dou, J Zhou, S Wang - 2011 21st International …, 2011 - ieeexplore.ieee.org
Many scientific computing applications require efficient variable-precision floating-point
arithmetic. This paper presents a special-purpose Variable-Precision Floating-Point …

FPGA-specific custom VLIW architecture for arbitrary precision floating-point arithmetic

Y Lei, Y Dou, J Zhou - IEICE TRANSACTIONS on Information and …, 2011 - search.ieice.org
Many scientific applications require efficient variable-precision floating-point arithmetic. This
paper presents a special-purpose Very Large Instruction Word (VLIW) architecture for …

FPGA-based acceleration of mutual information calculation for real-time 3D image registration

CR Castro-Pareja, JM Jagadeesh… - Real-Time Imaging …, 2004 - spiedigitallibrary.org
Real-time image registration is potentially an enabling technology for the effective and
efficient use of many image-guided diagnostic and treatment procedures relying on …

FPGA implementation of an exact dot product and its application in variable-precision floating-point arithmetic

Y Lei, Y Dou, Y Dong, J Zhou, F Xia - The Journal of Supercomputing, 2013 - Springer
The current paper explores the capability and flexibility of field programmable gate-arrays
(FPGAs) to implement variable-precision floating-point (VP) arithmetic. First, the VP exact dot …

[PDF][PDF] Hardware implementation of the logarithm function using improved parabolic synthesis

J Lai - 2013 - lup.lub.lu.se
This thesis presents a design that approximates the fractional part of the based two
logarithm function by using Improved Parabolic Synthesis including its CMOS VLSI …

Thin client front-end processor for distributed speech recognition

KF Chow, SC Liew, KT Lua - 2003 IEEE International …, 2003 - ieeexplore.ieee.org
We present a front-end feature processor for distributed speech recognition for an integer-
based DSP, and we employ block floating point and range reduction for the computation of …