Cache-conscious wavefront scheduling

SM Habib, S Ries, M Muhlhauser - 2010 7th International …, 2010 - ieeexplore.ieee.org

Cloud Computing is an emerging computing paradigm. It shares massively scalable, elastic
resources (eg, data, calculations, and services) transparently among the users over a …

被引用次数：193 相关文章所有 10 个版本

[PDF] nsf.gov

Accel-Sim: An extensible simulation framework for validated GPU modeling

M Khairy, Z Shen, TM Aamodt… - 2020 ACM/IEEE 47th …, 2020 - ieeexplore.ieee.org

In computer architecture, significant innovation frequently comes from industry. However, the
simulation tools used by industry are often not released for open use, and even when they …

被引用次数：297 相关文章所有 10 个版本

[PDF] illinois.edu

Transparent offloading and mapping (TOM) enabling programmer-transparent near-data processing in GPU systems

K Hsieh, E Ebrahimi, G Kim, N Chatterjee… - ACM SIGARCH …, 2016 - dl.acm.org

Main memory bandwidth is a critical bottleneck for modern GPU systems due to limited off-
chip pin bandwidth. 3D-stacked memory architectures provide a promising opportunity to …

被引用次数：327 相关文章所有 23 个版本

[PDF] acm.org

Scheduling techniques for GPU architectures with processing-in-memory capabilities

A Pattnaik, X Tang, A Jog, O Kayiran… - Proceedings of the …, 2016 - dl.acm.org

Processing data in or near memory (PIM), as opposed to in conventional computational units
in a processor, can greatly alleviate the performance and energy penalties of data transfers …

被引用次数：245 相关文章所有 15 个版本

[PDF] cmu.edu

OWL: cooperative thread array aware scheduling techniques for improving GPGPU performance

A Jog, O Kayiran, N Chidambaram Nachiappan… - ACM SIGPLAN …, 2013 - dl.acm.org

Emerging GPGPU architectures, along with programming models like CUDA and OpenCL,
offer a cost-effective platform for many applications by providing high thread level …

被引用次数：386 相关文章所有 21 个版本

[PDF] uth.gr

Neither more nor less: Optimizing thread-level parallelism for GPGPUs

O Kayıran, A Jog, MT Kandemir… - Proceedings of the 22nd …, 2013 - ieeexplore.ieee.org

General-purpose graphics processing units (GPG-PUs) are at their best in accelerating
computation by exploiting abundant thread-level parallelism (TLP) offered by many classes …

被引用次数：341 相关文章所有 16 个版本

[PDF] sjtu.edu.cn

Simultaneous multikernel GPU: Multi-tasking throughput processors via fine-grained sharing

Z Wang, J Yang, R Melhem, B Childers… - … symposium on high …, 2016 - ieeexplore.ieee.org

Studies show that non-graphics programs can be less optimized for the GPU hardware,
leading to significant resource under-utilization. Sharing the GPU among multiple programs …

被引用次数：201 相关文章所有 6 个版本

[PDF] acm.org

Mosaic: a GPU memory manager with application-transparent support for multiple page sizes

R Ausavarungnirun, J Landgraf, V Miller… - Proceedings of the 50th …, 2017 - dl.acm.org

Contemporary discrete GPUs support rich memory management features such as virtual
memory and demand paging. These features simplify GPU programming by providing a …

被引用次数：160 相关文章所有 26 个版本

[PDF] semanticscholar.org

Improving GPGPU resource utilization through alternative thread block scheduling

M Lee, S Song, J Moon, J Kim, W Seo… - 2014 IEEE 20th …, 2014 - ieeexplore.ieee.org

High performance in GPGPU workloads is obtained by maximizing parallelism and fully
utilizing the available resources. The thousands of threads are assigned to each core in …

被引用次数：224 相关文章所有 8 个版本

[PDF] psu.edu

Orchestrated scheduling and prefetching for GPGPUs

A Jog, O Kayiran, AK Mishra, MT Kandemir… - Proceedings of the 40th …, 2013 - dl.acm.org

In this paper, we present techniques that coordinate the thread scheduling and prefetching
decisions in a General Purpose Graphics Processing Unit (GPGPU) architecture to better …

被引用次数：257 相关文章所有 19 个版本

高级搜索

QQ 群