Morpheus: Creating application objects efficiently for heterogeneous computing

HW Tseng, Q Zhao, Y Zhou, M Gahagan… - ACM SIGARCH …, 2016 - dl.acm.org
In high performance computing systems, object deserialization can become a surprisingly
important bottleneck---in our test, a set of general-purpose, highly parallelized applications …

The landscape of gpu-centric communication

D Unat, I Turimbetov, MKT Issa, D Sağbili… - arXiv preprint arXiv …, 2024 - arxiv.org
In recent years, GPUs have become the preferred accelerators for HPC and ML applications
due to their parallelism and fast memory bandwidth. While GPUs boost computation, inter …

Network endpoint congestion control for fine-grained communication

N Jiang, L Dennison, WJ Dally - … of the International Conference for High …, 2015 - dl.acm.org
Endpoint congestion in HPC networks creates tree saturation that is detrimental to
performance. Endpoint congestion can be alleviated by reducing the injection rate of traffic …

Multi-GPU Communication Schemes for Iterative Solvers: When CPUs Are Not in Charge

I Ismayilov, J Baydamirli, D Sağbili, M Wahib… - Proceedings of the 37th …, 2023 - dl.acm.org
This paper proposes a fully autonomous execution model for multi-GPU applications that
completely excludes the involvement of the CPU beyond the initial kernel launch. In a typical …

InfiniBand Verbs on GPU: a case study of controlling an InfiniBand network device from the GPU

L Oden, H Fröning - The International Journal of High …, 2017 - journals.sagepub.com
Due to their massive parallelism and high performance per Watt, GPUs have gained high
popularity in high-performance computing and are a strong candidate for future exascale …

dCUDA: hardware supported overlap of computation and communication

T Gysi, J Bär, T Hoefler - SC'16: Proceedings of the …, 2016 - ieeexplore.ieee.org
Over the last decade, CUDA and the underlying GPU hardware architecture have
continuously gained popularity in various high-performance computing application domains …

GPU triggered networking for intra-kernel communications

M LeBeane, K Hamidouche, B Benton… - Proceedings of the …, 2017 - dl.acm.org
GPUs are widespread across clusters of compute nodes due to their attractive performance
for data parallel codes. However, communicating between GPUs across the cluster is …

Gpu initiated openshmem: correct and efficient intra-kernel networking for dgpus

K Hamidouche, M LeBeane - Proceedings of the 25th ACM SIGPLAN …, 2020 - dl.acm.org
Current state-of-the-art in GPU networking utilizes a host-centric, kernel-boundary
communication model that reduces performance and increases code complexity. To address …

Exploiting GPUDirect RDMA in designing high performance OpenSHMEM for NVIDIA GPU clusters

K Hamidouche, A Venkatesh, AA Awan… - 2015 IEEE …, 2015 - ieeexplore.ieee.org
GPUDirect RDMA (GDR) brings the high-performance communication capabilities of RDMA
networks like InfiniBand (IB) to GPUs (referred to as" Device"). It enables IB network …

Scalable communication architecture for network-attached accelerators

S Neuwirth, D Frey, M Nuessle… - 2015 IEEE 21st …, 2015 - ieeexplore.ieee.org
On the road to Exascale computing, novel communication architectures are required to
overcome the limitations of host-centric accelerators. Typically, accelerator devices require a …