T Allen, R Ge - Proceedings of the International Conference for High …, 2021 - dl.acm.org
The abstraction of a shared memory space over separate CPU and GPU memory domains has eased the burden of portability for many HPC codebases. However, users pay for the …
Bandwidth non-uniformity in multi-chip GPUs poses a major design challenge for its last- level cache (LLC) architecture. Whereas a memory-side LLC caches data from the local …
Multi-GPU systems have emerged as a desirable platform to deliver high computing capabilities and large memory capacity to accommodate large dataset sizes. However …
M Khairy, V Nikiforov, D Nellans… - 2020 53rd Annual IEEE …, 2020 - ieeexplore.ieee.org
Recent work has shown that building GPUs with hundreds of SMs in a single monolithic chip will not be practical due to slowing growth in transistor density, low chip yields, and …
In recent years, the ever-growing application complexity and input dataset sizes have driven the popularity of multi-GPU systems as a desirable computing platform for many application …
Suboptimal management of memory and bandwidth is one of the primary causes of low performance on systems comprising multiple GPUs. Existing memory management solutions …
T Allen, R Ge - 2021 IEEE International Parallel and Distributed …, 2021 - ieeexplore.ieee.org
With GPUs becoming ubiquitous in HPC systems, NVIDIA's Unified Virtual Memory (UVM) is being adopted as a measure to simplify porting of complex codes to GPU platforms by …
J Lee, JM Lee, Y Oh, WJ Song… - 2023 IEEE International …, 2023 - ieeexplore.ieee.org
This paper presents an address translation scheme in GPUs named SnakeByte that can dynamically manage variable-sized pages and maximize TLB reach by recursively merging …
C Liu, Y Sun, TE Carlson - Proceedings of the 56th Annual IEEE/ACM …, 2023 - dl.acm.org
GPUs, due to their massively-parallel computing architectures, provide high performance for data-parallel applications. However, existing GPU simulators are too slow to enable …