Cache coherence for GPU architectures

Transparent offloading and mapping (TOM) enabling programmer-transparent near-data processing in GPU systems

K Hsieh, E Ebrahimi, G Kim, N Chatterjee… - ACM SIGARCH …, 2016 - dl.acm.org

Main memory bandwidth is a critical bottleneck for modern GPU systems due to limited off-
chip pin bandwidth. 3D-stacked memory architectures provide a promising opportunity to …

被引用次数：311 相关文章所有 23 个版本

[PDF] github.io

Adaptive cache management for energy-efficient GPU computing

X Chen, LW Chang, CI Rodrigues, J Lv… - 2014 47th Annual …, 2014 - ieeexplore.ieee.org

With the SIMT execution model, GPUs can hide memory latency through massive
multithreading for many applications that have regular memory access patterns. To support …

被引用次数：204 相关文章所有 16 个版本

[PDF] wisc.edu

Heterogeneous system coherence for integrated CPU-GPU systems

J Power, A Basu, J Gu, S Puthoor… - Proceedings of the 46th …, 2013 - dl.acm.org

Many future heterogeneous systems will integrate CPUs and GPUs physically on a single
chip and logically connect them via shared memory to avoid explicit data copying. Making …

被引用次数：206 相关文章所有 15 个版本

[PDF] acm.org

Moesi-prime: preventing coherence-induced hammering in commodity workloads

K Loughlin, S Saroiu, A Wolman, YA Manerkar… - Proceedings of the 49th …, 2022 - dl.acm.org

Prior work shows that Rowhammer attacks---which flip bits in DRAM via frequent activations
of the same row (s)---are viable. Adversaries typically mount these attacks via instruction …

被引用次数：30 相关文章所有 9 个版本

[PDF] illinois.edu

Architectural support for address translation on gpus: Designing memory management units for cpu/gpus with unified address spaces

B Pichai, L Hsu, A Bhattacharjee - ACM SIGARCH Computer Architecture …, 2014 - dl.acm.org

The proliferation of heterogeneous compute platforms, of which CPU/GPU is a prevalent
example, necessitates a manageable programming model to ensure widespread adoption …

被引用次数：196 相关文章所有 13 个版本

[PDF] acm.org

Locality-driven dynamic GPU cache bypassing

C Li, SL Song, H Dai, A Sidelnik, SKS Hari… - Proceedings of the 29th …, 2015 - dl.acm.org

This paper presents novel cache optimizations for massively parallel, throughput-oriented
architectures like GPUs. L1 data caches (L1 D-caches) are critical resources for providing …

被引用次数：139 相关文章所有 6 个版本

[PDF] iczhiku.com

[图书][B] General-purpose graphics processor architectures

TM Aamodt, WWL Fung, TG Rogers, M Martonosi - 2018 - Springer

Originally developed to support video games, graphics processor units (GPUs) are now
increasingly used for general-purpose (non-graphics) applications ranging from machine …

被引用次数：91 相关文章所有 5 个版本

[PDF] upc.edu

Beyond the socket: NUMA-aware GPUs

U Milic, O Villa, E Bolotin, A Arunkumar… - Proceedings of the 50th …, 2017 - dl.acm.org

GPUs achieve high throughput and power efficiency by employing many small single
instruction multiple thread (SIMT) cores. To minimize scheduling logic and performance …

被引用次数：91 相关文章所有 7 个版本

[PDF] hal.science

A survey of techniques for managing and leveraging caches in GPUs

S Mittal - Journal of Circuits, Systems, and Computers, 2014 - World Scientific

Initially introduced as special-purpose accelerators for graphics applications, graphics
processing units (GPUs) have now emerged as general purpose computing platforms for a …

被引用次数：40 相关文章所有 11 个版本

[PDF] tu-dresden.de

Locality-aware CTA clustering for modern GPUs

A Li, SL Song, W Liu, X Liu, A Kumar… - ACM SIGARCH …, 2017 - dl.acm.org

Cache is designed to exploit locality; however, the role of on-chip L1 data caches on modern
GPUs is often awkward. The locality among global memory requests from different SMs …

被引用次数：91 相关文章所有 13 个版本

高级搜索

QQ 群