Parallelizing the QUDA library for multi-GPU calculations in lattice quantum chromodynamics

R Babich, MA Clark, B Joó - SC'10: Proceedings of the 2010 …, 2010 - ieeexplore.ieee.org
Graphics Processing Units (GPUs) are having a transformational effect on numerical lattice
quantum chromo-dynamics (LQCD) calculations of importance in nuclear and particle …

Orders-of-magnitude performance increases in GPU-accelerated correlation of images from the International Space Station

PJ Lu, H Oki, CA Frey, GE Chamitoff, L Chiao… - journal of real-time …, 2010 - Springer
We implement image correlation, a fundamental component of many real-time imaging and
tracking systems, on a graphics processing unit (GPU) using NVIDIA's CUDA platform. We …

Design of MILC lattice QCD application for GPU clusters

G Shi, S Gottlieb, A Torok… - 2011 IEEE International …, 2011 - ieeexplore.ieee.org
We present an implementation of the improved staggered quark action lattice QCD
computation designed for execution on a GPU cluster. The parallelization strategy is based …

Exploiting coarse-grained parallelism using cloud computing in massive power flow computation

DH Yoon, SK Kang, M Kim, Y Han - Energies, 2018 - mdpi.com
We present a novel architecture of parallel contingency analysis that accelerates massive
power flow computation using cloud computing. It leverages cloud computing to investigate …

Implementing Wilson-Dirac operator on the cell broadband engine

KZ Ibrahim, F Bodin - Proceedings of the 22nd annual international …, 2008 - dl.acm.org
Computing the actions of Wilson-Dirac operator contributes most of the CPU time for the
grand challenge problem of simulating Lattice Quantum Chromodynamics (Lattice QCD) …

Blasting through lattice calculations using CUDA

K Barros, R Babich, R Brower, MA Clark… - arXiv preprint arXiv …, 2008 - arxiv.org
Modern graphics hardware is designed for highly parallel numerical tasks and provides
significant cost and performance benefits. Graphics hardware vendors are now making …

Efficient simdization and data management of the lattice qcd computation on the cell broadband engine

KZ Ibrahim, F Bodin - Scientific Programming, 2009 - content.iospress.com
Efficient SIMDization and data management of the Lattice QCD computation on the Cell
Broadband Engine Page 1 Scientific Programming 17 (2009) 153–172 153 DOI 10.3233/SPR-2009-0275 …

From coarse-to fine-grained implementation of edge-directed interpolation using a GPU

J Wu, W Li, G Jeon - Information Sciences, 2017 - Elsevier
The new edge-directed interpolation (NEDI) algorithm is non-iterative and orientation-
adaptive. It achieves better edge performance in enhancing remote sensing images and …

Towards the petaflop for Lattice QCD simulations the PetaQCD project

JCA d'Auriac, D Barthou, D Becirevic… - Journal of Physics …, 2010 - iopscience.iop.org
The study and design of a very ambitious petaflop cluster exclusively dedicated to Lattice
QCD simulations started in early'08 among a consortium of 7 laboratories (IN2P3, CNRS …

[PDF][PDF] Implementing the dslash operator in opencl

A Kowalski, X Shen - College of William and Mary Technical Report, 2010 - Citeseer
The Dslash operator is used in Lattice Quantum Chromodymamics (LQCD) applications to
implement a Wilson-Dirac sparse matrix-vector product. Typically the Dslash operation has …