We introduce a systematic analysis in order to fuse CUDA kernels arising in efficient iterative methods for the solution of sparse linear systems. Our procedure characterizes the input and …
H Anzt, B Haugen, J Kurzak… - Concurrency and …, 2015 - Wiley Online Library
In this paper, we report extensive results and analysis of autotuning the computationally intensive graphics processing units kernel for dense matrix–matrix multiplication in double …
H Anzt, S Tomov, J Dongarra - … of the sixth international workshop on …, 2015 - dl.acm.org
In this paper we unveil some energy efficiency and performance frontiers for sparse computations on GPU-based supercomputers. To do this, we consider state-of-the-art …
The chances to reach Exascale or Ultrascale Computing are strongly connected with the problem of the energy consumption for processing applications. For physical and …
OpenMP is the de facto standard application programming interface (API) for on-node parallelism. The most popular OpenMP runtimes rely on POSIX threads (pthreads) …
JA Belloch, V Välimäki - 2016 IEEE International Conference …, 2016 - ieeexplore.ieee.org
A graphic equalizer is an adjustable filter in which the command gain of each frequency band is practically independent of the gains of other bands. Designing a graphic equalizer …
E Agullo, M Felšöci, A Guermouche… - 2022 IEEE 34th …, 2022 - ieeexplore.ieee.org
In the aeronautical industry, aeroacoustics is used to model the propagation of acoustic waves in air flows enveloping an aircraft in flight. This for instance allows one to simulate the …
T Rauber, G Rünger - Journal of Computational Science, 2021 - Elsevier
Ordinary differential equations (ODEs) are important for modelling many problems from science and engineering and efficient ODE solvers are required, for example when solving …
High-level parallel programming models (PMs) are becoming crucial in order to extract the computational power of current on-node multi-threaded parallelism. The most popular PMs …