Benchmarking GPUs to tune dense linear algebra

V Volkov, JW Demmel - SC'08: Proceedings of the 2008 ACM …, 2008 - ieeexplore.ieee.org
We present performance results for dense linear algebra using recent NVIDIA GPUs. Our
matrix-matrix multiply routine (GEMM) runs up to 60% faster than the vendor's …

[PDF][PDF] LU, QR and Cholesky factorizations using vector capabilities of GPUs

V Volkov, J Demmel - 2008 - eecs.berkeley.edu
We present performance results for dense linear algebra using the 8-series NVIDIA GPUs.
Our matrix-matrix multiply routine (GEMM) runs 60% faster than the vendor implementation …

An extensible system for multilevel automatic data partition and mapping

A Gonzalez-Escribano, Y Torres… - … on Parallel and …, 2013 - ieeexplore.ieee.org
Automatic data distribution is a key feature to obtain efficient implementations from abstract
and portable parallel codes. We present a highly efficient and extensible runtime library that …

Accelerating incompressible flow computations with a Pthreads-CUDA implementation on small-footprint multi-GPU platforms

JC Thibault, I Senocak - The Journal of Supercomputing, 2012 - Springer
Graphics processor units (GPU) that are originally designed for graphics rendering have
emerged as massively-parallel “co-processors” to the central processing unit (CPU). Small …

A prefetching technique for prediction of porous media flows

V Ginting, F Pereira, A Rahunanthan - Computational Geosciences, 2014 - Springer
In many applications in flows through porous media, one needs to determine the properties
of subsurface to detect, monitor, or predict the actions of natural or induced forces. Here, we …

Implementation of a Cartesian grid incompressible Navier-Stokes solver on multi-GPU desktop platforms using CUDA

JC Thibault - 2009 - scholarworks.boisestate.edu
Abstract Today's Graphics Processor Units (GPU) are powerful computation platforms used
not only for graphic rendering but also for multi-purpose computation. Now reaching a …

Automatic data partitioning applied to multigrid PDE solvers

J Fresno, A González-Escribano… - 2011 19th international …, 2011 - ieeexplore.ieee.org
This paper studies the impact of using automatic data-layout techniques on the process of
coding the well-known multigrid MG NAS parallel benchmark. We describe the sequential …

GPU acceleration of matrix-based methods in computational electromagnetics

E Lezar - 2011 - scholar.sun.ac.za
This work considers the acceleration of matrix-based computational electromagnetic (CEM)
techniques using graphics processing units (GPUs). These massively parallel processors …

OpenVX integration into the visual development environment

A Syschikov, B Sedov, K Nedovodeev… - International Journal of …, 2018 - igi-global.com
The OpenVX standard has appeared as an answer from the computer vision community to
the challenge of accelerating vision applications on embedded heterogeneous platforms. It …

Towards FFT-based direct numerical simulations of turbulent flows on a GPU

C Rucki, AJ Chandy - International Journal of Modeling, Simulation …, 2014 - World Scientific
The accurate simulation of turbulence and the implementation of corresponding turbulence
models are both critical to the understanding of the complex physics behind turbulent flows …