On the virtualization of CUDA based GPU remoting on ARM and X86 machines in the GVirtuS framework

R Montella, G Giunta, G Laccetti, M Lapegna… - International Journal of …, 2017 - Springer
The astonishing development of diverse and different hardware platforms is twofold: on one
side, the challenge for the exascale performance for big data processing and management; …

Single‐and multi‐GPU computing on NVIDIA‐and AMD‐based server platforms for solidification modeling application

K Halbiniak, N Meyer, K Rojek - Concurrency and Computation …, 2024 - Wiley Online Library
This work explores the performance of single‐and multi‐GPU computing on state‐of‐the‐art
NVIDIA‐and AMD‐based server‐class hardware using various programming interfaces to …

Investigation of parallel data processing using hybrid high performance CPU+ GPU systems and CUDA streams

P Czarnul - Computing and informatics, 2020 - cai.sk
The paper investigates parallel data processing in a hybrid CPU+ GPU (s) system using
multiple CUDA streams for overlapping communication and computations. This is crucial for …

Efficient and scalable multi-source streaming broadcast on GPU clusters for deep learning

CH Chu, X Lu, AA Awan, H Subramoni… - 2017 46th …, 2017 - ieeexplore.ieee.org
Broadcast operations (eg MPI_Bcast) have been widely used in deep learning applications
to exchange a large amount of data among multiple graphics processing units (GPUs) …

A message-driven, multi-GPU parallel sparse triangular solver

N Ding, Y Liu, S Williams, XS Li - SIAM Conference on Applied and …, 2021 - SIAM
Sparse triangular solve is used in conjunction with Sparse LU for solving sparse linear
systems, either as a direct solver or as a preconditioner. As GPUs have become a first-class …

A general design for a scalable MPI-GPU multi-resolution 2D numerical solver

M Turchetto, A Dal Palu… - IEEE Transactions on …, 2019 - ieeexplore.ieee.org
This article presents a multi-GPU implementation of a Finite-Volume solver on a multi-
resolution grid. The implementation completely offloads the computation to the GPUs and …

CPU+ GPU programming of stencil computations for resource-efficient use of GPU clusters

M Sourouri, J Langguth, F Spiga… - 2015 IEEE 18th …, 2015 - ieeexplore.ieee.org
On modern GPU clusters, the role of the CPUs is often restricted to controlling the GPUs and
handling MPI communication. The unused computing power of the CPUs, however, can be …

Classification and disease probability prediction via machine learning programming based on multi-GPU cluster MapReduce system

J Li, Q Chen, B Liu - The Journal of Supercomputing, 2017 - Springer
This paper described the nascent filed of big health data classification and disease
probability prediction based on multi-GPU cluster MapReduce platform. Firstly, we …

High-performance multi-GPU solver for describing nonlinear acoustic waves in homogeneous thermoviscous media

MA Diaz, MA Solovchuk, TWH Sheu - Computers & Fluids, 2018 - Elsevier
A double-precision numerical solver to describe the propagation of high-intensity ultrasound
fluctuations using a novel finite-amplitude compressible acoustic model working in multiple …

Exploiting hardware multicast and GPUDirect RDMA for efficient broadcast

CH Chu, X Lu, AA Awan, H Subramoni… - … on Parallel and …, 2018 - ieeexplore.ieee.org
Broadcast is a widely used operation in many streaming and deep learning applications to
disseminate large amounts of data on emerging heterogeneous High-Performance …