In-register parameter caching for dynamic neural nets with virtual persistent processor specializ...

S Mittal, S Umesh - Journal of Systems Architecture, 2021 - Elsevier

Abstract “Recurrent neural networks”(RNNs) are powerful artificial intelligence models that
have shown remarkable effectiveness in several tasks such as music generation, speech …

被引用次数：47 相关文章所有 3 个版本

[PDF] cmu.edu

Bingo spatial data prefetcher

M Bakhshalipour, M Shakerinava… - … Symposium on High …, 2019 - ieeexplore.ieee.org

Applications extensively use data objects with a regular and fixed layout, which leads to the
recurrence of access patterns over memory regions. Spatial data prefetching techniques …

被引用次数：130 相关文章所有 7 个版本

Evaluation of hardware data prefetchers on server processors

M Bakhshalipour, S Tabaeiaghdaei… - ACM Computing …, 2019 - dl.acm.org

Data prefetching, ie, the act of predicting an application's future memory accesses and
fetching those that are not in the on-chip caches, is a well-known and widely used approach …

被引用次数：34 相关文章

[PDF] arxiv.org

An experimental study of reduced-voltage operation in modern FPGAs for neural network acceleration

B Salami, EB Onural, IE Yuksel, F Koc… - 2020 50th Annual …, 2020 - ieeexplore.ieee.org

We empirically evaluate an undervolting technique, ie, underscaling the circuit supply
voltage below the nominal level, to improve the power-efficiency of Convolutional Neural …

被引用次数：54 相关文章所有 11 个版本

Demystifying bert: System design implications

S Pati, S Aga, N Jayasena… - 2022 IEEE International …, 2022 - ieeexplore.ieee.org

Transfer learning in natural language processing (NLP) uses increasingly large models that
tackle challenging problems. Consequently, these applications are driving the requirements …

被引用次数：15 相关文章所有 2 个版本

[PDF] mlsys.org

Cortex: A compiler for recursive deep learning models

P Fegade, T Chen, P Gibbons… - Proceedings of Machine …, 2021 - proceedings.mlsys.org

Optimizing deep learning models is generally performed in two steps:(i) high-level graph
optimizations such as kernel fusion and (ii) low level kernel optimizations such as those …

被引用次数：26 相关文章所有 9 个版本

[PDF] arxiv.org

Morpheus: Extending the last level cache capacity in GPU systems using idle GPU core resources

S Darabi, M Sadrosadati, N Akbarzadeh… - 2022 55th IEEE/ACM …, 2022 - ieeexplore.ieee.org

Graphics Processing Units (GPUs) are widely-used accelerators for data-parallel
applications. In many GPU applications, GPU memory bandwidth bottlenecks performance …

被引用次数：10 相关文章所有 6 个版本

[PDF] acm.org

Corf: Coalescing operand register file for gpus

H Asghari Esfeden, F Khorasani, H Jeon… - Proceedings of the …, 2019 - dl.acm.org

The Register File (RF) in GPUs is a critical structure that maintains the state for thousands of
threads that support the GPU processing model. The RF organization substantially affects …

被引用次数：43 相关文章所有 10 个版本

[PDF] acm.org Full View

ITAP: Idle-time-aware power management for GPU execution units

M Sadrosadati, SB Ehsani, H Falahati… - ACM Transactions on …, 2019 - dl.acm.org

Graphics Processing Units (GPUs) are widely used as the accelerator of choice for
applications with massively data-parallel tasks. However, recent studies show that GPUs …

被引用次数：32 相关文章所有 6 个版本

[PDF] nsf.gov

Blockmaestro: Enabling programmer-transparent task-based execution in gpu systems

AA Abdolrashidi, HA Esfeden… - 2021 ACM/IEEE 48th …, 2021 - ieeexplore.ieee.org

As modern GPU workloads grow in size and complexity, there is an ever-increasing demand
for GPU computational power. Emerging workloads contain hundreds or thousands of GPU …

被引用次数：15 相关文章所有 8 个版本

高级搜索

QQ 群