Near-Data-Processing (NDP) architectures present a promising way to alleviate data movement costs and can provide significant performance and energy benefits to parallel …
B Pichai, L Hsu, A Bhattacharjee - ACM SIGARCH Computer Architecture …, 2014 - dl.acm.org
The proliferation of heterogeneous compute platforms, of which CPU/GPU is a prevalent example, necessitates a manageable programming model to ensure widespread adoption …
S Mittal - Concurrency and computation: practice and …, 2017 - Wiley Online Library
Translation lookaside buffer (TLB) caches virtual to physical address translation information and is used in systems ranging from embedded devices to high‐end servers. Because TLB …
Energy harvesting system has huge potential to enable battery-less Internet of Things (IoT) services. However, it has been designed without a cache due to the difficulty of crash …
The TLB is increasingly a bottleneck for big data applications. In most designs, the number of TLB entries are highly constrained by latency requirements, and growing much more …
As hardware accelerators proliferate, there is a desire to logically integrate them more tightly with CPUs through interfaces such as shared virtual memory. Although this integration has …
Cache coherence is ubiquitous in shared memory multiprocessors because it provides a simple, high performance memory abstraction to programmers. Recent work suggests …
Address translation is a performance bottleneck in data-intensive workloads due to large datasets and irregular access patterns that lead to frequent high-latency page table walks …
A coherent global address space in a distributed system enables shared memory programming in a much larger scale than a single multicore or a single SMP. Without …