DARM: control-flow melding for SIMT thread divergence reduction

C Saumya, K Sundararajah… - 2022 IEEE/ACM …, 2022 - ieeexplore.ieee.org
GPGPUs use the Single-Instruction-Multiple-Thread (SIMT) execution model where a group
of threads—wavefront or warp—execute instructions in lockstep. When threads in a group …

An automated tool for analysis and tuning of gpu-accelerated code in hpc applications

K Zhou, X Meng, R Sai, D Grubisic… - … on Parallel and …, 2021 - ieeexplore.ieee.org
The US Department of Energy's fastest supercomputers and forthcoming exascale systems
employ Graphics Processing Units (GPUs) to increase the computational performance of …

Gpu subwarp interleaving

S Damani, M Stephenson, R Rangan… - … Symposium on High …, 2022 - ieeexplore.ieee.org
Raytracing applications have naturally high thread divergence, low warp occupancy and are
limited by memory latency. In this paper, we present an architectural enhancement called …

Vulkan Vision: Ray tracing workload characterization using automatic graphics instrumentation

D Pankratz, T Nowicki, A Eltantawy… - 2021 IEEE/ACM …, 2021 - ieeexplore.ieee.org
While there are mature performance monitoring, profiling and instrumentation tools to help
understanding the dynamic behaviour of general-purpose GPU applications, the abstract …

SIMR: Single Instruction Multiple Request Processing for Energy-Efficient Data Center Microservices

M Khairy, A Alawneh, A Barnes… - 2022 55th IEEE/ACM …, 2022 - ieeexplore.ieee.org
Contemporary data center servers process thousands of similar, independent requests per
minute. In the interest of programmer productivity and ease of scaling, workloads in data …

Control Flow Management in Modern GPUs

MA Shoushtary, JT Murgadas, A Gonzalez - arXiv preprint arXiv …, 2024 - arxiv.org
In GPUs, the control flow management mechanism determines which threads in a warp are
active at any point in time. This mechanism monitors the control flow of scalar threads within …

[PDF][PDF] Performance Measurement, Analysis, and Optimization of GPU-accelerated Applications

K Zhou - 2022 - repository.rice.edu
The computing landscape is undergoing rapid evolution to meet the demand in
dataintensive applications and grand challenging scientific problems. Figure 1.1 illustrates …

[PDF][PDF] Εξερευνώντας την ετερογένεια πυρήνων για επιταχυντές GPU

Α Μοίρας - 2024 - dspace.lib.ntua.gr
Οι GPU, που αρχικά προτάθηκαν αποκλειστικά ως επιταχυντές επεξεργασίας γραφικών,
βρίσκουν πλέον εφαρμογή σε ένα συνεχώς αυξανόμενο εύρος τομέων. Η ανάλυση μιας …

DARM: Control-Flow Melding for SIMT Thread Divergence Reduction--Extended Version

C Saumya, K Sundararajah, M Kulkarni - arXiv preprint arXiv:2107.05681, 2021 - arxiv.org
GPGPUs use the Single-Instruction-Multiple-Thread (SIMT) execution model where a group
of threads-wavefront or warp-execute instructions in lockstep. When threads in a group …

Taming Irregular Control-Flow with Targeted Compiler Transformations

CSG Waduge - 2023 - search.proquest.com
Irregular control-flow structures like deeply nested conditional branches are common in real-
world software applications. Improving the performance and efficiency of such programs is …