A new perspective for efficient virtual-cache coherence

Y Shan, Y Huang, Y Chen, Y Zhang - 13th USENIX Symposium on …, 2018 - usenix.org

The monolithic server model where a server is the unit of deployment, operation, and failure
is meeting its limits in the face of several recent hardware and application trends. To improve …

被引用次数：447 相关文章所有 22 个版本

[PDF] arxiv.org

Syncron: Efficient synchronization support for near-data-processing architectures

C Giannoula, N Vijaykumar… - … Symposium on High …, 2021 - ieeexplore.ieee.org

Near-Data-Processing (NDP) architectures present a promising way to alleviate data
movement costs and can provide significant performance and energy benefits to parallel …

被引用次数：96 相关文章所有 13 个版本

[PDF] illinois.edu

Architectural support for address translation on gpus: Designing memory management units for cpu/gpus with unified address spaces

B Pichai, L Hsu, A Bhattacharjee - ACM SIGARCH Computer Architecture …, 2014 - dl.acm.org

The proliferation of heterogeneous compute platforms, of which CPU/GPU is a prevalent
example, necessitates a manageable programming model to ensure widespread adoption …

被引用次数：205 相关文章所有 13 个版本

[PDF] researchgate.net

A survey of techniques for architecting TLBs

S Mittal - Concurrency and computation: practice and …, 2017 - Wiley Online Library

Translation lookaside buffer (TLB) caches virtual to physical address translation information
and is used in systems ranging from embedded devices to high‐end servers. Because TLB …

被引用次数：40 相关文章所有 8 个版本

[PDF] acm.org

Write-light cache for energy harvesting systems

J Choi, J Zeng, D Lee, C Min, C Jung - Proceedings of the 50th Annual …, 2023 - dl.acm.org

Energy harvesting system has huge potential to enable battery-less Internet of Things (IoT)
services. However, it has been designed without a cache due to the difficulty of crash …

被引用次数：17 相关文章所有 9 个版本

[PDF] acm.org

Mosaic pages: Big TLB reach with small pages

K Gosakan, J Han, W Kuszmaul, IN Mubarek… - Proceedings of the 28th …, 2023 - dl.acm.org

The TLB is increasingly a bottleneck for big data applications. In most designs, the number
of TLB entries are highly constrained by latency requirements, and growing much more …

被引用次数：21 相关文章所有 12 个版本

[PDF] wisc.edu

Border control: Sandboxing accelerators

LE Olson, J Power, MD Hill, DA Wood - Proceedings of the 48th …, 2015 - dl.acm.org

As hardware accelerators proliferate, there is a desire to logically integrate them more tightly
with CPUs through interfaces such as shared virtual memory. Although this integration has …

被引用次数：76 相关文章所有 13 个版本

[PDF] utexas.edu

Selective GPU caches to eliminate CPU-GPU HW cache coherence

N Agarwal, D Nellans, E Ebrahimi… - … Symposium on High …, 2016 - ieeexplore.ieee.org

Cache coherence is ubiquitous in shared memory multiprocessors because it provides a
simple, high performance memory abstraction to programmers. Recent work suggests …

被引用次数：73 相关文章所有 5 个版本

[PDF] arxiv.org

Victima: Drastically Increasing Address Translation Reach by Leveraging Underutilized Cache Resources

K Kanellopoulos, HC Nam, N Bostanci, R Bera… - Proceedings of the 56th …, 2023 - dl.acm.org

Address translation is a performance bottleneck in data-intensive workloads due to large
datasets and irregular access patterns that lead to frequent high-latency page table walks …

被引用次数：9 相关文章所有 6 个版本

[PDF] uu.se

Turning centralized coherence and distributed critical-section execution on their head: A new approach for scalable distributed shared memory

S Kaxiras, D Klaftenegger, M Norgren, A Ros… - Proceedings of the 24th …, 2015 - dl.acm.org

A coherent global address space in a distributed system enables shared memory
programming in a much larger scale than a single multicore or a single SMP. Without …

被引用次数：65 相关文章所有 14 个版本

高级搜索

QQ 群