This work explores the performance of single‐and multi‐GPU computing on state‐of‐the‐art NVIDIA‐and AMD‐based server‐class hardware using various programming interfaces to …
P Czarnul - Computing and informatics, 2020 - cai.sk
The paper investigates parallel data processing in a hybrid CPU+ GPU (s) system using multiple CUDA streams for overlapping communication and computations. This is crucial for …
Broadcast operations (eg MPI_Bcast) have been widely used in deep learning applications to exchange a large amount of data among multiple graphics processing units (GPUs) …
Sparse triangular solve is used in conjunction with Sparse LU for solving sparse linear systems, either as a direct solver or as a preconditioner. As GPUs have become a first-class …
This article presents a multi-GPU implementation of a Finite-Volume solver on a multi- resolution grid. The implementation completely offloads the computation to the GPUs and …
On modern GPU clusters, the role of the CPUs is often restricted to controlling the GPUs and handling MPI communication. The unused computing power of the CPUs, however, can be …
J Li, Q Chen, B Liu - The Journal of Supercomputing, 2017 - Springer
This paper described the nascent filed of big health data classification and disease probability prediction based on multi-GPU cluster MapReduce platform. Firstly, we …
MA Diaz, MA Solovchuk, TWH Sheu - Computers & Fluids, 2018 - Elsevier
A double-precision numerical solver to describe the propagation of high-intensity ultrasound fluctuations using a novel finite-amplitude compressible acoustic model working in multiple …
Broadcast is a widely used operation in many streaming and deep learning applications to disseminate large amounts of data on emerging heterogeneous High-Performance …