A survey on hardware accelerators and optimization techniques for RNNs

S Mittal, S Umesh - Journal of Systems Architecture, 2021 - Elsevier
Abstract “Recurrent neural networks”(RNNs) are powerful artificial intelligence models that
have shown remarkable effectiveness in several tasks such as music generation, speech …

Bingo spatial data prefetcher

M Bakhshalipour, M Shakerinava… - … Symposium on High …, 2019 - ieeexplore.ieee.org
Applications extensively use data objects with a regular and fixed layout, which leads to the
recurrence of access patterns over memory regions. Spatial data prefetching techniques …

Evaluation of hardware data prefetchers on server processors

M Bakhshalipour, S Tabaeiaghdaei… - ACM Computing …, 2019 - dl.acm.org
Data prefetching, ie, the act of predicting an application's future memory accesses and
fetching those that are not in the on-chip caches, is a well-known and widely used approach …

An experimental study of reduced-voltage operation in modern FPGAs for neural network acceleration

B Salami, EB Onural, IE Yuksel, F Koc… - 2020 50th Annual …, 2020 - ieeexplore.ieee.org
We empirically evaluate an undervolting technique, ie, underscaling the circuit supply
voltage below the nominal level, to improve the power-efficiency of Convolutional Neural …

Demystifying bert: System design implications

S Pati, S Aga, N Jayasena… - 2022 IEEE International …, 2022 - ieeexplore.ieee.org
Transfer learning in natural language processing (NLP) uses increasingly large models that
tackle challenging problems. Consequently, these applications are driving the requirements …

Cortex: A compiler for recursive deep learning models

P Fegade, T Chen, P Gibbons… - Proceedings of Machine …, 2021 - proceedings.mlsys.org
Optimizing deep learning models is generally performed in two steps:(i) high-level graph
optimizations such as kernel fusion and (ii) low level kernel optimizations such as those …

Morpheus: Extending the last level cache capacity in GPU systems using idle GPU core resources

S Darabi, M Sadrosadati, N Akbarzadeh… - 2022 55th IEEE/ACM …, 2022 - ieeexplore.ieee.org
Graphics Processing Units (GPUs) are widely-used accelerators for data-parallel
applications. In many GPU applications, GPU memory bandwidth bottlenecks performance …

Corf: Coalescing operand register file for gpus

H Asghari Esfeden, F Khorasani, H Jeon… - Proceedings of the …, 2019 - dl.acm.org
The Register File (RF) in GPUs is a critical structure that maintains the state for thousands of
threads that support the GPU processing model. The RF organization substantially affects …

ITAP: Idle-time-aware power management for GPU execution units

M Sadrosadati, SB Ehsani, H Falahati… - ACM Transactions on …, 2019 - dl.acm.org
Graphics Processing Units (GPUs) are widely used as the accelerator of choice for
applications with massively data-parallel tasks. However, recent studies show that GPUs …

Blockmaestro: Enabling programmer-transparent task-based execution in gpu systems

AA Abdolrashidi, HA Esfeden… - 2021 ACM/IEEE 48th …, 2021 - ieeexplore.ieee.org
As modern GPU workloads grow in size and complexity, there is an ever-increasing demand
for GPU computational power. Emerging workloads contain hundreds or thousands of GPU …