GPU-aware MPI on RDMA-enabled clusters: Design, implementation and evaluation

H Wang, S Potluri, D Bureddy… - … on Parallel and …, 2013 - ieeexplore.ieee.org
Designing high-performance and scalable applications on GPU clusters requires tackling
several challenges. The key challenge is the separate host memory and device memory …

Multi-GPU Communication Schemes for Iterative Solvers: When CPUs Are Not in Charge

I Ismayilov, J Baydamirli, D Sağbili, M Wahib… - Proceedings of the 37th …, 2023 - dl.acm.org
This paper proposes a fully autonomous execution model for multi-GPU applications that
completely excludes the involvement of the CPU beyond the initial kernel launch. In a typical …

XcalableACC: Extension of XcalableMP PGAS language using OpenACC for accelerator clusters

M Nakao, H Murai, T Shimosaka… - 2014 First Workshop …, 2014 - ieeexplore.ieee.org
The present paper introduces the XcalableACC (XACC) programming model, which is a
hybrid model of the XcalableMP (XMP) Partitioned Global Address Space (PGAS) language …

Gpu-centric communication on nvidia gpu clusters with infiniband: A case study with openshmem

S Potluri, A Goswami, D Rossetti… - 2017 IEEE 24th …, 2017 - ieeexplore.ieee.org
GPUs have become an essential component for building compute clusters with high
compute density and high performance per watt. As such clusters scale to have 1000s of …

InfiniBand Verbs on GPU: a case study of controlling an InfiniBand network device from the GPU

L Oden, H Fröning - The International Journal of High …, 2017 - journals.sagepub.com
Due to their massive parallelism and high performance per Watt, GPUs have gained high
popularity in high-performance computing and are a strong candidate for future exascale …

Exploiting GPUDirect RDMA in designing high performance OpenSHMEM for NVIDIA GPU clusters

K Hamidouche, A Venkatesh, AA Awan… - 2015 IEEE …, 2015 - ieeexplore.ieee.org
GPUDirect RDMA (GDR) brings the high-performance communication capabilities of RDMA
networks like InfiniBand (IB) to GPUs (referred to as" Device"). It enables IB network …

Gpu-aware non-contiguous data movement in open mpi

W Wu, G Bosilca, R Vandevaart, S Jeaugey… - Proceedings of the 25th …, 2016 - dl.acm.org
Due to better parallel density and power efficiency, GPUs have become more popular for
use in scientific applica-tions. Many of these applications are based on the ubiquitous …

MPI-ACC: accelerator-aware MPI for scientific applications

AM Aji, LS Panwar, F Ji, K Murthy… - IEEE transactions on …, 2015 - ieeexplore.ieee.org
Data movement in high-performance computing systems accelerated by graphics
processing units (GPUs) remains a challenging problem. Data communication in popular …

A novel approach for big data processing using message passing interface based on memory mapping

SA Dheyab, MN Abdullah, BF Abed - Journal of Big Data, 2019 - Springer
The analysis and processing of big data are one of the most important challenges that
researchers are working on to find the best approaches to handle it with high performance …

Energy-efficient collective reduce and allreduce operations on distributed GPUs

L Oden, B Klenk, H Fröning - 2014 14th IEEE/ACM …, 2014 - ieeexplore.ieee.org
GPUs gain high popularity in High Performance Computing, due to their massive parallelism
and high performance per Watt. Despite their popularity, data transfer between multiple …