Thread block compaction for efficient SIMT control flow

P Hijma, S Heldens, A Sclocco… - ACM Computing …, 2023 - dl.acm.org

In the past decade, Graphics Processing Units have played an important role in the field of
high-performance computing and they still advance new fields such as IoT, autonomous …

被引用次数：54 相关文章所有 3 个版本

[PDF] toronto.edu

Cnvlutin: Ineffectual-neuron-free deep neural network computing

J Albericio, P Judd, T Hetherington, T Aamodt… - ACM SIGARCH …, 2016 - dl.acm.org

This work observes that a large fraction of the computations performed by Deep Neural
Networks (DNNs) are intrinsically ineffectual as they involve a multiplication where one of …

被引用次数：918 相关文章所有 16 个版本

[PDF] ubc.ca

GPUWattch: Enabling energy optimizations in GPGPUs

J Leng, T Hetherington, A ElTantawy, S Gilani… - ACM SIGARCH …, 2013 - dl.acm.org

General-purpose GPUs (GPGPUs) are becoming prevalent in mainstream computing, and
performance per watt has emerged as a more crucial evaluation metric than peak …

被引用次数：778 相关文章所有 21 个版本

[PDF] purdue.edu

Cache-conscious wavefront scheduling

TG Rogers, M O'Connor… - 2012 45th Annual IEEE …, 2012 - ieeexplore.ieee.org

This paper studies the effects of hardware thread scheduling on cache management in
GPUs. We propose Cache-Conscious Wave front Scheduling (CCWS), an adaptive …

被引用次数：556 相关文章所有 12 个版本

[PDF] utexas.edu

A quantitative study of irregular programs on GPUs

M Burtscher, R Nasre, K Pingali - 2012 IEEE International …, 2012 - ieeexplore.ieee.org

GPUs have been used to accelerate many regular applications and, more recently, irregular
applications in which the control flow and memory access patterns are data-dependent and …

被引用次数：512 相关文章所有 10 个版本

[PDF] danielwong.org

Improving GPU performance via large warps and two-level warp scheduling

V Narasiman, M Shebanow, CJ Lee… - Proceedings of the 44th …, 2011 - dl.acm.org

Due to their massive computational power, graphics processing units (GPUs) have become
a popular platform for executing general purpose parallel applications. GPU programming …

被引用次数：554 相关文章所有 19 个版本

[PDF] cmu.edu

OWL: cooperative thread array aware scheduling techniques for improving GPGPU performance

A Jog, O Kayiran, N Chidambaram Nachiappan… - ACM SIGPLAN …, 2013 - dl.acm.org

Emerging GPGPU architectures, along with programming models like CUDA and OpenCL,
offer a cost-effective platform for many applications by providing high thread level …

被引用次数：383 相关文章所有 21 个版本

[PDF] uth.gr

Neither more nor less: Optimizing thread-level parallelism for GPGPUs

O Kayıran, A Jog, MT Kandemir… - Proceedings of the 22nd …, 2013 - ieeexplore.ieee.org

General-purpose graphics processing units (GPG-PUs) are at their best in accelerating
computation by exploiting abundant thread-level parallelism (TLP) offered by many classes …

被引用次数：340 相关文章所有 16 个版本

[PDF] ieee.org

MIMD Programs Execution Support on SIMD Machines: A Holistic Survey

D Mustafa, R Alkhasawneh, F Obeidat… - IEEE Access, 2024 - ieeexplore.ieee.org

The Single Instruction Multiple Data (SIMD) architecture, supported by various high-
performance computing platforms, efficiently utilizes data-level parallelism. The SIMD model …

被引用次数：2 相关文章所有 2 个版本

[PDF] illinois.edu

Architectural support for address translation on gpus: Designing memory management units for cpu/gpus with unified address spaces

B Pichai, L Hsu, A Bhattacharjee - ACM SIGARCH Computer Architecture …, 2014 - dl.acm.org

The proliferation of heterogeneous compute platforms, of which CPU/GPU is a prevalent
example, necessitates a manageable programming model to ensure widespread adoption …

被引用次数：200 相关文章所有 13 个版本

高级搜索

QQ 群