The landscape of gpu-centric communication

D Unat, I Turimbetov, MKT Issa, D Sağbili… - arXiv preprint arXiv …, 2024 - arxiv.org
In recent years, GPUs have become the preferred accelerators for HPC and ML applications
due to their parallelism and fast memory bandwidth. While GPUs boost computation, inter …

GPUrdma: GPU-side library for high performance networking from GPU kernels

F Daoud, A Watad, M Silberstein - … of the 6th international Workshop on …, 2016 - dl.acm.org
We present GPUrdma, a GPU-side library for performing Remote Direct Memory Accesses
(RDMA) across the network directly from GPU kernels. The library executes no code on …

Multi-GPU Communication Schemes for Iterative Solvers: When CPUs Are Not in Charge

I Ismayilov, J Baydamirli, D Sağbili, M Wahib… - Proceedings of the 37th …, 2023 - dl.acm.org
This paper proposes a fully autonomous execution model for multi-GPU applications that
completely excludes the involvement of the CPU beyond the initial kernel launch. In a typical …

Flexdriver: A network driver for your accelerator

H Eran, M Fudim, G Malka, G Shalom… - Proceedings of the 27th …, 2022 - dl.acm.org
We propose a new system design for connecting hardware and FPGA accelerators to the
network, allowing the accelerator to directly control commodity Network Interface Cards …

Toward FPGA-based HPC: Advancing interconnect technologies

J Lant, J Navaridas, M Luján, J Goodacre - IEEE Micro, 2019 - ieeexplore.ieee.org
HPC architects are currently facing myriad challenges from ever tighter power constraints
and changing workload characteristics. In this article, we discuss the current state of FPGAs …

AI-optimised tuneable sources for bandwidth-scalable, sub-nanosecond wavelength switching

T Gerard, C Parsonson, Z Shabka, B Thomsen… - Optics …, 2021 - opg.optica.org
Wavelength routed optical switching promises low power and latency networking for data
centres, but requires a wideband wavelength tuneable source (WTS) capable of sub …

Designing efficient small message transfer mechanism for inter-node MPI communication on InfiniBand GPU clusters

R Shi, S Potluri, K Hamidouche… - … Conference on High …, 2014 - ieeexplore.ieee.org
Increasing number of MPI applications are being ported to take advantage of the compute
power offered by GPUs. Data movement on GPU clusters continues to be the major …

[PDF][PDF] Software Aging and Multifractality of Memory Resources.

M Shereshevsky, J Crowell, B Cukic, V Gandikota… - DSN, 2003 - scholar.archive.org
We investigate the dynamics of monitored memory resource utilizations in an operating
system under stress using quantitative methods of fractal analysis. In the experiments, we …

Exploring GPU stream-aware message passing using triggered operations

N Namashivayam, K Kandalla, T White… - arXiv preprint arXiv …, 2022 - arxiv.org
Modern heterogeneous supercomputing systems are comprised of compute blades that offer
CPUs and GPUs. On such systems, it is essential to move data efficiently between these …

dCUDA: hardware supported overlap of computation and communication

T Gysi, J Bär, T Hoefler - SC'16: Proceedings of the …, 2016 - ieeexplore.ieee.org
Over the last decade, CUDA and the underlying GPU hardware architecture have
continuously gained popularity in various high-performance computing application domains …