Making programming synonymous with programming for linear algebra libraries

V Volkov, JW Demmel - SC'08: Proceedings of the 2008 ACM …, 2008 - ieeexplore.ieee.org

We present performance results for dense linear algebra using recent NVIDIA GPUs. Our
matrix-matrix multiply routine (GEMM) runs up to 60% faster than the vendor's …

被引用次数：1164 相关文章所有 19 个版本

[PDF] berkeley.edu

[PDF][PDF] LU, QR and Cholesky factorizations using vector capabilities of GPUs

V Volkov, J Demmel - 2008 - eecs.berkeley.edu

We present performance results for dense linear algebra using the 8-series NVIDIA GPUs.
Our matrix-matrix multiply routine (GEMM) runs 60% faster than the vendor implementation …

被引用次数：244 相关文章所有 8 个版本

[PDF] google.com

An extensible system for multilevel automatic data partition and mapping

A Gonzalez-Escribano, Y Torres… - … on Parallel and …, 2013 - ieeexplore.ieee.org

Automatic data distribution is a key feature to obtain efficient implementations from abstract
and portable parallel codes. We present a highly efficient and extensible runtime library that …

被引用次数：67 相关文章所有 6 个版本

Accelerating incompressible flow computations with a Pthreads-CUDA implementation on small-footprint multi-GPU platforms

JC Thibault, I Senocak - The Journal of Supercomputing, 2012 - Springer

Graphics processor units (GPU) that are originally designed for graphics rendering have
emerged as massively-parallel “co-processors” to the central processing unit (CPU). Small …

被引用次数：68 相关文章所有 9 个版本

A prefetching technique for prediction of porous media flows

V Ginting, F Pereira, A Rahunanthan - Computational Geosciences, 2014 - Springer

In many applications in flows through porous media, one needs to determine the properties
of subsurface to detect, monitor, or predict the actions of natural or induced forces. Here, we …

被引用次数：10 相关文章所有 6 个版本

[PDF] boisestate.edu

Implementation of a Cartesian grid incompressible Navier-Stokes solver on multi-GPU desktop platforms using CUDA

JC Thibault - 2009 - scholarworks.boisestate.edu

Abstract Today's Graphics Processor Units (GPU) are powerful computation platforms used
not only for graphic rendering but also for multi-purpose computation. Now reaching a …

被引用次数：12 相关文章所有 6 个版本

[PDF] uva.es

Automatic data partitioning applied to multigrid PDE solvers

J Fresno, A González-Escribano… - 2011 19th international …, 2011 - ieeexplore.ieee.org

This paper studies the impact of using automatic data-layout techniques on the process of
coding the well-known multigrid MG NAS parallel benchmark. We describe the sequential …

被引用次数：13 相关文章所有 12 个版本

[PDF] sun.ac.za

GPU acceleration of matrix-based methods in computational electromagnetics

E Lezar - 2011 - scholar.sun.ac.za

This work considers the acceleration of matrix-based computational electromagnetic (CEM)
techniques using graphics processing units (GPUs). These massively parallel processors …

被引用次数：6 相关文章所有 4 个版本

[PDF] researchgate.net

OpenVX integration into the visual development environment

A Syschikov, B Sedov, K Nedovodeev… - International Journal of …, 2018 - igi-global.com

The OpenVX standard has appeared as an answer from the computer vision community to
the challenge of accelerating vision applications on embedded heterogeneous platforms. It …

被引用次数：3 相关文章所有 5 个版本

Towards FFT-based direct numerical simulations of turbulent flows on a GPU

C Rucki, AJ Chandy - International Journal of Modeling, Simulation …, 2014 - World Scientific

The accurate simulation of turbulence and the implementation of corresponding turbulence
models are both critical to the understanding of the complex physics behind turbulent flows …

被引用次数：3 相关文章所有 3 个版本

高级搜索

QQ 群