Draw: investigating benefits of adaptive fetch group size on gpu

K Kim, S Lee, MK Yoon, G Koo, WW Ro… - … Symposium on High …, 2016 - ieeexplore.ieee.org

This paper presents a pre-execution approach for improving GPU performance, called P-
mode (pre-execution mode). GPUs utilize a number of concurrent threads for hiding …

被引用次数：61 相关文章所有 6 个版本

[PDF] iastate.edu

APRES: Improving cache efficiency by exploiting load characteristics on GPUs

Y Oh, K Kim, MK Yoon, JH Park, Y Park… - ACM SIGARCH …, 2016 - dl.acm.org

Long memory latency and limited throughput become performance bottlenecks of GPGPU
applications. The latency takes hundreds of cycles which is difficult to be hidden by simply …

被引用次数：45 相关文章所有 9 个版本

[PDF] github.io

A survey of architectural approaches for improving GPGPU performance, programmability and heterogeneity

M Khairy, AG Wassal, M Zahran - Journal of Parallel and Distributed …, 2019 - Elsevier

With the skyrocketing advances of process technology, the increased need to process huge
amount of data, and the pivotal need for power efficiency, the usage of Graphics Processing …

被引用次数：36 相关文章所有 4 个版本

FineReg: Fine-grained register file management for augmenting GPU throughput

Y Oh, MK Yoon, WJ Song… - 2018 51st Annual IEEE …, 2018 - ieeexplore.ieee.org

Graphics processing units (GPUs) include a large amount of hardware resources for parallel
thread executions. However, the resources are not fully utilized during runtime, and …

被引用次数：23 相关文章所有 5 个版本

Data optimization: Minimizing residual interprocessor data motion on simd machines

K Knobe, V Natarajan - Third Symposium on the Frontiers of …, 1990 - computer.org

This paper presents a pre-execution approach for improving GPU performance, called P-
mode (pre-execution mode). GPUs utilize a number of concurrent threads for hiding …

被引用次数：58 相关文章所有 2 个版本

Dynamic resizing on active warps scheduler to hide operation stalls on GPUs

MK Yoon, Y Oh, SH Kim, S Lee, D Kim… - IEEE Transactions on …, 2017 - ieeexplore.ieee.org

This paper conducts a detailed study of the factors affecting the operation stalls in terms of
the fetch group size on the warp scheduler of GPUs. Throughout this paper, we reveal that …

被引用次数：4 相关文章所有 5 个版本

[PDF] ubc.ca

Locality and scheduling in the massively multithreaded era

TG Rogers - 2015 - open.library.ubc.ca

Massively parallel processing devices, like Graphics Processing Units (GPUs), have the
ability to accelerate highly parallel workloads in an energy-efficient manner. However …

被引用次数：5 相关文章所有 4 个版本

[PDF] koreascience.kr

Latency hiding based warp scheduling policy for high performance GPUs

GB Kim, JM Kim, CH Kim - Journal of The Korea Society of …, 2019 - koreascience.kr

Abstract LRR (Loose Round Robin) warp scheduling policy for GPU architecture results in
high warp-level parallelism and balanced loads across multiple warps. However, traditional …

被引用次数：3 相关文章

A distributed architecture and design challenges of an astray pilgrim tracking system

MAR Abdeen - 2018 IEEE 16th Intl Conf on Dependable …, 2018 - ieeexplore.ieee.org

In this paper we present a distributed architecture to address the problems of managing,
tracking, and predicting astray person in large crowds. Our case study considers the …

被引用次数：2 相关文章所有 2 个版本

[PDF] googleapis.com

Methods and apparatus for intra-wave texture looping

AE Gruber - US Patent 11,640,647, 2023 - Google Patents

US11640647B2 - Methods and apparatus for intra-wave texture looping - Google Patents
US11640647B2 - Methods and apparatus for intra-wave texture looping - Google Patents …

高级搜索

QQ 群