GGAS: Global GPU address spaces for efficient communication in heterogeneous clusters

HW Tseng, Q Zhao, Y Zhou, M Gahagan… - ACM SIGARCH …, 2016 - dl.acm.org

In high performance computing systems, object deserialization can become a surprisingly
important bottleneck---in our test, a set of general-purpose, highly parallelized applications …

被引用次数：90 相关文章所有 11 个版本

[PDF] arxiv.org

The landscape of gpu-centric communication

D Unat, I Turimbetov, MKT Issa, D Sağbili… - arXiv preprint arXiv …, 2024 - arxiv.org

In recent years, GPUs have become the preferred accelerators for HPC and ML applications
due to their parallelism and fast memory bandwidth. While GPUs boost computation, inter …

被引用次数：1 相关文章所有 3 个版本

Network endpoint congestion control for fine-grained communication

N Jiang, L Dennison, WJ Dally - … of the International Conference for High …, 2015 - dl.acm.org

Endpoint congestion in HPC networks creates tree saturation that is detrimental to
performance. Endpoint congestion can be alleviated by reducing the injection rate of traffic …

被引用次数：59 相关文章所有 3 个版本

[PDF] acm.org

Multi-GPU Communication Schemes for Iterative Solvers: When CPUs Are Not in Charge

I Ismayilov, J Baydamirli, D Sağbili, M Wahib… - Proceedings of the 37th …, 2023 - dl.acm.org

This paper proposes a fully autonomous execution model for multi-GPU applications that
completely excludes the involvement of the CPU beyond the initial kernel launch. In a typical …

被引用次数：12 相关文章

[PDF] researchgate.net

InfiniBand Verbs on GPU: a case study of controlling an InfiniBand network device from the GPU

L Oden, H Fröning - The International Journal of High …, 2017 - journals.sagepub.com

Due to their massive parallelism and high performance per Watt, GPUs have gained high
popularity in high-performance computing and are a strong candidate for future exascale …

被引用次数：46 相关文章所有 10 个版本

[PDF] ethz.ch

dCUDA: hardware supported overlap of computation and communication

T Gysi, J Bär, T Hoefler - SC'16: Proceedings of the …, 2016 - ieeexplore.ieee.org

Over the last decade, CUDA and the underlying GPU hardware architecture have
continuously gained popularity in various high-performance computing application domains …

被引用次数：38 相关文章所有 31 个版本

[PDF] acm.org

GPU triggered networking for intra-kernel communications

M LeBeane, K Hamidouche, B Benton… - Proceedings of the …, 2017 - dl.acm.org

GPUs are widespread across clusters of compute nodes due to their attractive performance
for data parallel codes. However, communicating between GPUs across the cluster is …

被引用次数：27 相关文章所有 8 个版本

[PDF] mlebeane.com

Gpu initiated openshmem: correct and efficient intra-kernel networking for dgpus

K Hamidouche, M LeBeane - Proceedings of the 25th ACM SIGPLAN …, 2020 - dl.acm.org

Current state-of-the-art in GPU networking utilizes a host-centric, kernel-boundary
communication model that reduces performance and increases code complexity. To address …

被引用次数：20 相关文章所有 3 个版本

Exploiting GPUDirect RDMA in designing high performance OpenSHMEM for NVIDIA GPU clusters

K Hamidouche, A Venkatesh, AA Awan… - 2015 IEEE …, 2015 - ieeexplore.ieee.org

GPUDirect RDMA (GDR) brings the high-performance communication capabilities of RDMA
networks like InfiniBand (IB) to GPUs (referred to as" Device"). It enables IB network …

被引用次数：31 相关文章所有 4 个版本

[PDF] researchgate.net

Scalable communication architecture for network-attached accelerators

S Neuwirth, D Frey, M Nuessle… - 2015 IEEE 21st …, 2015 - ieeexplore.ieee.org

On the road to Exascale computing, novel communication architectures are required to
overcome the limitations of host-centric accelerators. Typically, accelerator devices require a …

被引用次数：28 相关文章所有 5 个版本

高级搜索

QQ 群