Custom hardware architectures for deep learning on portable devices: a review

KS Zaman, MBI Reaz, SHM Ali… - … on Neural Networks …, 2021 - ieeexplore.ieee.org
The staggering innovations and emergence of numerous deep learning (DL) applications
have forced researchers to reconsider hardware architecture to accommodate fast and …

Optimizing deep learning recommender systems training on cpu cluster architectures

D Kalamkar, E Georganas, S Srinivasan… - … Conference for High …, 2020 - ieeexplore.ieee.org
During the last two years, the goal of many researchers has been to squeeze the last bit of
performance out of HPC system for AI tasks. Often this discussion is held in the context of …

Gtrans: Grouping and fusing transformer layers for neural machine translation

J Yang, Y Yin, L Yang, S Ma, H Huang… - … on Audio, Speech …, 2022 - ieeexplore.ieee.org
Transformer structure, stacked by a sequence of encoder and decoder network layers,
achieves significant development in neural machine translation. However, vanilla …

Optimizing deep learning rnn topologies on intel architecture

K Banerjee, E Georganas, DD Kalamkar, B Ziv… - Supercomputing …, 2019 - superfri.org
Recurrent neural network (RNN) models have been found to be well suited for processing
temporal data. In this work, we present an optimized implementation of vanilla RNN cell and …

An Optimal Design Method of Conv2d Operator for TensorFlow Based on FPGA Accelerator

R Li, H Kan, D Su, Y Wang, H Zhao… - Proceedings of the 4th …, 2020 - dl.acm.org
Currently, TensorFlow architecture only supports CPU and GPU programming, and has not
yet formed a unified support standard for FPGAs. To the best of our knowledge, when …

Automatic Chinese-English Translation Algorithm based on Out-of-vocabulary Words in the Context of Cross-cultural Communication

J Duan, H Ma, J Wang - IEIE Transactions on Smart Processing & …, 2023 - dbpia.co.kr
In the context of cross-cultural communication, translation between languages has become
increasingly important. Based on automatic Chinese–English translation, this study …

A specification that supports FPGA devices on the TensorFlow framework

H Zhao, H Kan, YW Wang, Q Zhao, D Su… - Proceedings of the 2020 …, 2020 - dl.acm.org
With the rise of artificial intelligence and machine learning, many applications and services
require FPGA support to speed up the training process and improve efficiency. FPGA has its …

[引用][C] Optimizing Deep Learning RNN Topologies on Intel Architecture

D Dhiraj