InfiniBand Verbs on GPU: a case study of controlling an InfiniBand network device from the GPU

D Unat, I Turimbetov, MKT Issa, D Sağbili… - arXiv preprint arXiv …, 2024 - arxiv.org

In recent years, GPUs have become the preferred accelerators for HPC and ML applications
due to their parallelism and fast memory bandwidth. While GPUs boost computation, inter …

被引用次数：1 相关文章所有 3 个版本

[PDF] marksilberstein.com

GPUrdma: GPU-side library for high performance networking from GPU kernels

F Daoud, A Watad, M Silberstein - … of the 6th international Workshop on …, 2016 - dl.acm.org

We present GPUrdma, a GPU-side library for performing Remote Direct Memory Accesses
(RDMA) across the network directly from GPU kernels. The library executes no code on …

被引用次数：67 相关文章所有 5 个版本

[PDF] acm.org

Multi-GPU Communication Schemes for Iterative Solvers: When CPUs Are Not in Charge

I Ismayilov, J Baydamirli, D Sağbili, M Wahib… - Proceedings of the 37th …, 2023 - dl.acm.org

This paper proposes a fully autonomous execution model for multi-GPU applications that
completely excludes the involvement of the CPU beyond the initial kernel launch. In a typical …

被引用次数：12 相关文章

[PDF] academia.edu

Flexdriver: A network driver for your accelerator

H Eran, M Fudim, G Malka, G Shalom… - Proceedings of the 27th …, 2022 - dl.acm.org

We propose a new system design for connecting hardware and FPGA accelerators to the
network, allowing the accelerator to directly control commodity Network Interface Cards …

被引用次数：15 相关文章所有 6 个版本

[PDF] manchester.ac.uk

Toward FPGA-based HPC: Advancing interconnect technologies

J Lant, J Navaridas, M Luján, J Goodacre - IEEE Micro, 2019 - ieeexplore.ieee.org

HPC architects are currently facing myriad challenges from ever tighter power constraints
and changing workload characteristics. In this article, we discuss the current state of FPGAs …

被引用次数：35 相关文章所有 4 个版本

[PDF] optica.org

AI-optimised tuneable sources for bandwidth-scalable, sub-nanosecond wavelength switching

T Gerard, C Parsonson, Z Shabka, B Thomsen… - Optics …, 2021 - opg.optica.org

Wavelength routed optical switching promises low power and latency networking for data
centres, but requires a wideband wavelength tuneable source (WTS) capable of sub …

被引用次数：20 相关文章所有 6 个版本

Designing efficient small message transfer mechanism for inter-node MPI communication on InfiniBand GPU clusters

R Shi, S Potluri, K Hamidouche… - … Conference on High …, 2014 - ieeexplore.ieee.org

Increasing number of MPI applications are being ported to take advantage of the compute
power offered by GPUs. Data movement on GPU clusters continues to be the major …

被引用次数：59 相关文章所有 3 个版本

[PDF] archive.org

[PDF][PDF] Software Aging and Multifractality of Memory Resources.

M Shereshevsky, J Crowell, B Cukic, V Gandikota… - DSN, 2003 - scholar.archive.org

We investigate the dynamics of monitored memory resource utilizations in an operating
system under stress using quantitative methods of fractal analysis. In the experiments, we …

被引用次数：97 相关文章所有 5 个版本

[PDF] arxiv.org

Exploring GPU stream-aware message passing using triggered operations

N Namashivayam, K Kandalla, T White… - arXiv preprint arXiv …, 2022 - arxiv.org

Modern heterogeneous supercomputing systems are comprised of compute blades that offer
CPUs and GPUs. On such systems, it is essential to move data efficiently between these …

被引用次数：12 相关文章所有 3 个版本

[PDF] ethz.ch

dCUDA: hardware supported overlap of computation and communication

T Gysi, J Bär, T Hoefler - SC'16: Proceedings of the …, 2016 - ieeexplore.ieee.org

Over the last decade, CUDA and the underlying GPU hardware architecture have
continuously gained popularity in various high-performance computing application domains …

被引用次数：38 相关文章所有 31 个版本

高级搜索

QQ 群