Evaluation of hardware data prefetchers on server processors

M Bakhshalipour, S Tabaeiaghdaei… - ACM Computing …, 2019 - dl.acm.org
Data prefetching, ie, the act of predicting an application's future memory accesses and
fetching those that are not in the on-chip caches, is a well-known and widely used approach …

Paver: Locality graph-based thread block scheduling for gpus

D Tripathy, A Abdolrashidi, LN Bhuyan, L Zhou… - ACM Transactions on …, 2021 - dl.acm.org
The massive parallelism present in GPUs comes at the cost of reduced L1 and L2 cache
sizes per thread, leading to serious cache contention problems such as thrashing. Hence …

Convolutional neural network with element-wise filters to extract hierarchical topological features for brain networks

X Xing, J Ji, Y Yao - 2018 IEEE international conference on …, 2018 - ieeexplore.ieee.org
Human brain network analysis based on machine learning has been paid much attention in
the field of neuroimaging, where the application of convolutional neural network (CNN) is …

Enhancing server efficiency in the face of killer microseconds

A Mirhosseini, A Sriraman… - 2019 IEEE International …, 2019 - ieeexplore.ieee.org
We are entering an era of “killer microseconds” in data center applications. Killer
microseconds refer to μs-scale “holes” in CPU schedules caused by stalls to access fast I/O …

Corf: Coalescing operand register file for gpus

H Asghari Esfeden, F Khorasani, H Jeon… - Proceedings of the …, 2019 - dl.acm.org
The Register File (RF) in GPUs is a critical structure that maintains the state for thousands of
threads that support the GPU processing model. The RF organization substantially affects …

NURA: A framework for supporting non-uniform resource accesses in GPUs

S Darabi, N Mahani, H Baxishi… - Proceedings of the …, 2022 - dl.acm.org
Multi-application execution in Graphics Processing Units (GPUs), a promising way to utilize
GPU resources, is still challenging. Some pieces of prior work (eg, spatial multitasking) have …

In-register parameter caching for dynamic neural nets with virtual persistent processor specialization

F Khorasani, HA Esfeden… - 2018 51st Annual …, 2018 - ieeexplore.ieee.org
Dynamic neural networks enable higher representation flexibility compared to networks with
a fixed architecture and are extensively deployed in problems dealing with varying input …

ITAP: Idle-time-aware power management for GPU execution units

M Sadrosadati, SB Ehsani, H Falahati… - ACM Transactions on …, 2019 - dl.acm.org
Graphics Processing Units (GPUs) are widely used as the accelerator of choice for
applications with massively data-parallel tasks. However, recent studies show that GPUs …

R2D2: Removing ReDunDancy Utilizing Linearity of Address Generation in GPUs

D Ha, Y Oh, WW Ro - Proceedings of the 50th Annual International …, 2023 - dl.acm.org
A generally used GPU programming methodology is that adjacent threads access data in
neighbor or specific-stride memory addresses and perform computations with the fetched …

Blockmaestro: Enabling programmer-transparent task-based execution in gpu systems

AA Abdolrashidi, HA Esfeden… - 2021 ACM/IEEE 48th …, 2021 - ieeexplore.ieee.org
As modern GPU workloads grow in size and complexity, there is an ever-increasing demand
for GPU computational power. Emerging workloads contain hundreds or thousands of GPU …