Optimization techniques for GPU programming

P Hijma, S Heldens, A Sclocco… - ACM Computing …, 2023 - dl.acm.org
In the past decade, Graphics Processing Units have played an important role in the field of
high-performance computing and they still advance new fields such as IoT, autonomous …

Cnvlutin: Ineffectual-neuron-free deep neural network computing

J Albericio, P Judd, T Hetherington, T Aamodt… - ACM SIGARCH …, 2016 - dl.acm.org
This work observes that a large fraction of the computations performed by Deep Neural
Networks (DNNs) are intrinsically ineffectual as they involve a multiplication where one of …

GPUWattch: Enabling energy optimizations in GPGPUs

J Leng, T Hetherington, A ElTantawy, S Gilani… - ACM SIGARCH …, 2013 - dl.acm.org
General-purpose GPUs (GPGPUs) are becoming prevalent in mainstream computing, and
performance per watt has emerged as a more crucial evaluation metric than peak …

Cache-conscious wavefront scheduling

TG Rogers, M O'Connor… - 2012 45th Annual IEEE …, 2012 - ieeexplore.ieee.org
This paper studies the effects of hardware thread scheduling on cache management in
GPUs. We propose Cache-Conscious Wave front Scheduling (CCWS), an adaptive …

A quantitative study of irregular programs on GPUs

M Burtscher, R Nasre, K Pingali - 2012 IEEE International …, 2012 - ieeexplore.ieee.org
GPUs have been used to accelerate many regular applications and, more recently, irregular
applications in which the control flow and memory access patterns are data-dependent and …

Improving GPU performance via large warps and two-level warp scheduling

V Narasiman, M Shebanow, CJ Lee… - Proceedings of the 44th …, 2011 - dl.acm.org
Due to their massive computational power, graphics processing units (GPUs) have become
a popular platform for executing general purpose parallel applications. GPU programming …

OWL: cooperative thread array aware scheduling techniques for improving GPGPU performance

A Jog, O Kayiran, N Chidambaram Nachiappan… - ACM SIGPLAN …, 2013 - dl.acm.org
Emerging GPGPU architectures, along with programming models like CUDA and OpenCL,
offer a cost-effective platform for many applications by providing high thread level …

Neither more nor less: Optimizing thread-level parallelism for GPGPUs

O Kayıran, A Jog, MT Kandemir… - Proceedings of the 22nd …, 2013 - ieeexplore.ieee.org
General-purpose graphics processing units (GPG-PUs) are at their best in accelerating
computation by exploiting abundant thread-level parallelism (TLP) offered by many classes …

MIMD Programs Execution Support on SIMD Machines: A Holistic Survey

D Mustafa, R Alkhasawneh, F Obeidat… - IEEE Access, 2024 - ieeexplore.ieee.org
The Single Instruction Multiple Data (SIMD) architecture, supported by various high-
performance computing platforms, efficiently utilizes data-level parallelism. The SIMD model …

Architectural support for address translation on gpus: Designing memory management units for cpu/gpus with unified address spaces

B Pichai, L Hsu, A Bhattacharjee - ACM SIGARCH Computer Architecture …, 2014 - dl.acm.org
The proliferation of heterogeneous compute platforms, of which CPU/GPU is a prevalent
example, necessitates a manageable programming model to ensure widespread adoption …