Silicon-based optoelectronics for general-purpose matrix computation: a review

P Xu, Z Zhou - Advanced Photonics, 2022 - spiedigitallibrary.org
Conventional electronic processors, which are the mainstream and almost invincible
hardware for computation, are approaching their limits in both computational power and …

Long short-term memory recurrent neural network for automatic speech recognition

J Oruh, S Viriri, A Adegun - IEEE Access, 2022 - ieeexplore.ieee.org
Automatic speech recognition (ASR) is one of the most demanding tasks in natural language
processing owing to its complexity. Recently, deep learning approaches have been …

When neural architecture search meets hardware implementation: from hardware awareness to co-design

X Zhang, W Jiang, Y Shi, J Hu - 2019 IEEE Computer Society …, 2019 - ieeexplore.ieee.org
Neural Architecture Search (NAS), that automatically identifies the best network architecture,
is a promising technique to respond to the ever-growing demand for application-specific …

A survey on the optimization of neural network accelerators for micro-ai on-device inference

AN Mazumder, J Meng, HA Rashid… - IEEE Journal on …, 2021 - ieeexplore.ieee.org
Deep neural networks (DNNs) are being prototyped for a variety of artificial intelligence (AI)
tasks including computer vision, data analytics, robotics, etc. The efficacy of DNNs coincides …

heFFTe: Highly Efficient FFT for Exascale

A Ayala, S Tomov, A Haidar, J Dongarra - International Conference on …, 2020 - Springer
Exascale computing aspires to meet the increasing demands from large scientific
applications. Software targeting exascale is typically designed for heterogeneous …

Deep neural network for resource management in NOMA networks

N Yang, H Zhang, K Long, HY Hsieh… - IEEE Transactions on …, 2019 - ieeexplore.ieee.org
Resource management plays a crucial role in improving sum rate of non-orthogonal multiple
access (NOMA) networks. However, the traditional resource management methods have …

In-depth analyses of unified virtual memory system for GPU accelerated computing

T Allen, R Ge - Proceedings of the International Conference for High …, 2021 - dl.acm.org
The abstraction of a shared memory space over separate CPU and GPU memory domains
has eased the burden of portability for many HPC codebases. However, users pay for the …

Rnsnet: In-memory neural network acceleration using residue number system

S Salamat, M Imani, S Gupta… - 2018 IEEE International …, 2018 - ieeexplore.ieee.org
We live in a world where technological advances are continually creating more data than
what we can deal with. Machine learning algorithms, in particular Deep Neural Networks …

Winograd convolution for deep neural networks: Efficient point selection

SA Alam, A Anderson, B Barabasz… - ACM Transactions on …, 2022 - dl.acm.org
Convolutional neural networks (CNNs) have dramatically improved the accuracy of image,
video, and audio processing for tasks such as object recognition, image segmentation, and …

Cohort-based federated learning services for industrial collaboration on the edge

T Hiessl, SR Lakani, J Kemnitz, D Schall… - Journal of parallel and …, 2022 - Elsevier
Abstract Machine Learning (ML) is increasingly applied in industrial manufacturing, but often
performance is limited due to insufficient training data. While ML models can benefit from …