Tag-split cache for efficient GPGPU cache utilization

A Li, SL Song, W Liu, X Liu, A Kumar… - ACM SIGARCH …, 2017 - dl.acm.org

Cache is designed to exploit locality; however, the role of on-chip L1 data caches on modern
GPUs is often awkward. The locality among global memory requests from different SMs …

被引用次数：96 相关文章所有 13 个版本

[PDF] acm.org

Cudaadvisor: Llvm-based runtime profiling for modern gpus

D Shen, SL Song, A Li, X Liu - … of the 2018 International Symposium on …, 2018 - dl.acm.org

General-purpose GPUs have been widely utilized to accelerate parallel applications. Given
a relatively complex programming model and fast architecture evolution, producing efficient …

被引用次数：55 相关文章所有 2 个版本

[PDF] acm.org

Cooperative caching for GPUs

S Dublish, V Nagarajan, N Topham - ACM Transactions on Architecture …, 2016 - dl.acm.org

The rise of general-purpose computing on GPUs has influenced architectural innovation on
them. The introduction of an on-chip cache hierarchy is one such innovation. High L1 miss …

被引用次数：35 相关文章所有 5 个版本

[PDF] um.es

The implications of page size management on graph analytics

A Manocha, Z Yan, E Tureci, JL Aragón… - 2022 IEEE …, 2022 - ieeexplore.ieee.org

Graph representations of data are ubiquitous in analytic applications. However, graph
workloads are notorious for having irregular memory access patterns with variable access …

被引用次数：6 相关文章所有 5 个版本

FineReg: Fine-grained register file management for augmenting GPU throughput

Y Oh, MK Yoon, WJ Song… - 2018 51st Annual IEEE …, 2018 - ieeexplore.ieee.org

Graphics processing units (GPUs) include a large amount of hardware resources for parallel
thread executions. However, the resources are not fully utilized during runtime, and …

被引用次数：23 相关文章所有 5 个版本

Graphfire: Synergizing fetch, insertion, and replacement policies for graph analytics

A Manocha, JL Aragón… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org

Despite their ubiquity in many important big-data applications, graph analytic kernels
continue to challenge modern memory hierarchies due to their frequent, long-latency …

被引用次数：7 相关文章所有 4 个版本

Linebacker: Preserving victim cache lines in idle register files of GPUs

Y Oh, G Koo, M Annavaram, WW Ro - Proceedings of the 46th …, 2019 - dl.acm.org

Modern GPUs suffer from cache contention due to the limited cache size that is shared
across tens of concurrently running warps. To increase the per-warp cache size prior …

被引用次数：13 相关文章所有 7 个版本

[PDF] nsf.gov

Scrabble: A fine-grained cache with adaptive merged block

C Zhang, Y Zeng, X Guo - IEEE Transactions on Computers, 2019 - ieeexplore.ieee.org

A large fraction of the microprocessor energy is consumed by the data movement in the
system. One of the reasons is the inefficiency in the conventional cache design. Cache …

被引用次数：10 相关文章所有 4 个版本

Improving Data Movement Efficiency in the Memory Systems for Irregular Applications

C Zhang - 2021 - search.proquest.com

Modern processors have a large processor-memory frequency gap, which urges the
computer designer to address the issue of the inefficiency of the memory system …

被引用次数：3 相关文章

[PDF] nankai.edu.cn

Efficient gpu-based query processing with pruned list caching in search engines

D Wang, W Yu, RJ Stones, J Ren… - 2017 IEEE 23rd …, 2017 - ieeexplore.ieee.org

There are two inherent obstacles to effectively using Graphics Processing Units (GPUs) for
query processing in search engines:(a) the highly restricted GPU memory space, and (b) the …

被引用次数：5 相关文章所有 4 个版本

高级搜索

QQ 群