Characterizing and enhancing global memory data coalescing on GPUs

N Fauzia, LN Pouchet… - 2015 IEEE/ACM …, 2015 - ieeexplore.ieee.org
Effective parallel programming for GPUs requires careful attention to several factors,
including ensuring coalesced access of data from global memory. There is a need for tools …

Compiler-assisted data streaming for regular code structures

N Neves, P Tomás, N Roma - IEEE Transactions on Computers, 2020 - ieeexplore.ieee.org
The performance of modern processors is often limited by execution stalls resulting from
long memory access latencies. Compile-time optimizations, deep cache hierarchies and …

GPUDrano: Detecting uncoalesced accesses in GPU programs

R Alur, J Devietti, OS Navarro Leija… - … Aided Verification: 29th …, 2017 - Springer
Abstract Graphics Processing Units (GPUs) have become widespread and popular over the
past decade. Fully utilizing the parallel compute and memory resources that GPUs present …

Static detection of uncoalesced accesses in GPU programs

R Alur, J Devietti, OSN Leija, N Singhania - Formal Methods in System …, 2022 - Springer
GPU programming has become popular due to the high computational capabilities of GPUs.
Obtaining significant performance gains with GPU is however challenging and the …

[图书][B] Static Analysis for GPU Program Performance

N Singhania - 2018 - search.proquest.com
GPUs have become popular due to their high computational power. Data scientists rely on
GPUs to process loads of data being generated by their systems. From a humble beginning …

Stream data prefetcher for the GPU memory interface

N Neves, P Tomás, N Roma - The Journal of Supercomputing, 2018 - Springer
Data caches are often unable to efficiently cope with the massive and simultaneous requests
imposed by the SIMT execution model of modern GPUs. While software-aided cache …

[图书][B] Characterization of Data Locality Potential of CPU and GPU Applications through Dynamic Analysis

N Fauzia - 2015 - search.proquest.com
Emerging computer architectures will feature drastically decreased flops/byte (ratio of peak
processing rate to memory bandwidth), as highlighted by recent studies on Exascale …

[PDF][PDF] Hardware and Software Optimizations for GPU Resource Management

V Jatala - 2018 - vishweshjatala.github.io
Hardware and Software Optimizations for GPU Resource Management Page 1 Hardware and
Software Optimizations for GPU Resource Management A Thesis Submitted in Partial …