Regmutex: Inter-warp gpu register time-sharing

M Bakhshalipour, S Tabaeiaghdaei… - ACM Computing …, 2019 - dl.acm.org

Data prefetching, ie, the act of predicting an application's future memory accesses and
fetching those that are not in the on-chip caches, is a well-known and widely used approach …

被引用次数：34 相关文章

[PDF] acm.org Full View

Paver: Locality graph-based thread block scheduling for gpus

D Tripathy, A Abdolrashidi, LN Bhuyan, L Zhou… - ACM Transactions on …, 2021 - dl.acm.org

The massive parallelism present in GPUs comes at the cost of reduced L1 and L2 cache
sizes per thread, leading to serious cache contention problems such as thrashing. Hence …

被引用次数：28 相关文章所有 6 个版本

Convolutional neural network with element-wise filters to extract hierarchical topological features for brain networks

X Xing, J Ji, Y Yao - 2018 IEEE international conference on …, 2018 - ieeexplore.ieee.org

Human brain network analysis based on machine learning has been paid much attention in
the field of neuroimaging, where the application of convolutional neural network (CNN) is …

被引用次数：51 相关文章所有 2 个版本

Enhancing server efficiency in the face of killer microseconds

A Mirhosseini, A Sriraman… - 2019 IEEE International …, 2019 - ieeexplore.ieee.org

We are entering an era of “killer microseconds” in data center applications. Killer
microseconds refer to μs-scale “holes” in CPU schedules caused by stalls to access fast I/O …

被引用次数：50 相关文章所有 2 个版本

[PDF] acm.org

Corf: Coalescing operand register file for gpus

H Asghari Esfeden, F Khorasani, H Jeon… - Proceedings of the …, 2019 - dl.acm.org

The Register File (RF) in GPUs is a critical structure that maintains the state for thousands of
threads that support the GPU processing model. The RF organization substantially affects …

被引用次数：43 相关文章所有 10 个版本

[PDF] academia.edu

NURA: A framework for supporting non-uniform resource accesses in GPUs

S Darabi, N Mahani, H Baxishi… - Proceedings of the …, 2022 - dl.acm.org

Multi-application execution in Graphics Processing Units (GPUs), a promising way to utilize
GPU resources, is still challenging. Some pieces of prior work (eg, spatial multitasking) have …

被引用次数：11 相关文章所有 5 个版本

[PDF] researchgate.net

In-register parameter caching for dynamic neural nets with virtual persistent processor specialization

F Khorasani, HA Esfeden… - 2018 51st Annual …, 2018 - ieeexplore.ieee.org

Dynamic neural networks enable higher representation flexibility compared to networks with
a fixed architecture and are extensively deployed in problems dealing with varying input …

被引用次数：37 相关文章所有 7 个版本

[PDF] acm.org Full View

ITAP: Idle-time-aware power management for GPU execution units

M Sadrosadati, SB Ehsani, H Falahati… - ACM Transactions on …, 2019 - dl.acm.org

Graphics Processing Units (GPUs) are widely used as the accelerator of choice for
applications with massively data-parallel tasks. However, recent studies show that GPUs …

被引用次数：32 相关文章所有 6 个版本

R2D2: Removing ReDunDancy Utilizing Linearity of Address Generation in GPUs

D Ha, Y Oh, WW Ro - Proceedings of the 50th Annual International …, 2023 - dl.acm.org

A generally used GPU programming methodology is that adjacent threads access data in
neighbor or specific-stride memory addresses and perform computations with the fetched …

被引用次数：3 相关文章所有 2 个版本

[PDF] nsf.gov

Blockmaestro: Enabling programmer-transparent task-based execution in gpu systems

AA Abdolrashidi, HA Esfeden… - 2021 ACM/IEEE 48th …, 2021 - ieeexplore.ieee.org

As modern GPU workloads grow in size and complexity, there is an ever-increasing demand
for GPU computational power. Emerging workloads contain hundreds or thousands of GPU …

被引用次数：15 相关文章所有 8 个版本

高级搜索

QQ 群