PARTANS: An autotuning framework for stencil computation on multi-GPU systems

T Lutz, C Fensch, M Cole - ACM Transactions on Architecture and Code …, 2013 - dl.acm.org
GPGPUs are a powerful and energy-efficient solution for many problems. For higher
performance or larger problems, it is necessary to distribute the problem across multiple …

Memory access patterns: The missing piece of the multi-GPU puzzle

T Ben-Nun, E Levy, A Barak, E Rubin - Proceedings of the International …, 2015 - dl.acm.org
With the increased popularity of multi-GPU nodes in modern HPC clusters, it is imperative to
develop matching programming paradigms for their efficient utilization. In order to take …

dOpenCL: Towards a uniform programming approach for distributed heterogeneous multi-/many-core systems

P Kegel, M Steuwer, S Gorlatch - 2012 IEEE 26th International …, 2012 - ieeexplore.ieee.org
Modern computer systems are becoming increasingly heterogeneous by comprising multi-
core CPUs, GPUs, and other accelerators. Current programming approaches for such …

Development effort estimation in hpc

S Wienke, J Miller, M Schulz… - SC'16: Proceedings of …, 2016 - ieeexplore.ieee.org
In order to cover the ever increasing demands for computational power, while meeting
electrical power and budget constraints, HPC systems are continuing to increase in …

Parallel patterns for heterogeneous CPU/GPU architectures: Structured parallelism from cluster to cloud

S Campa, M Danelutto, M Goli… - Future Generation …, 2014 - Elsevier
The widespread adoption of traditional heterogeneous systems has substantially improved
the computing power available and, in the meantime, raised optimisation issues related to …

On the support of task-parallel algorithmic skeletons for multi-GPU computing

F Alexandre, R Marques, H Paulino - Proceedings of the 29th Annual …, 2014 - dl.acm.org
An emerging trend in the field of Graphics Processing Unit (GPU) computing is the
harnessing of multiple devices to cope with scalability and performance requirements …

Introducing and implementing the allpairs skeleton for programming multi-GPU systems

M Steuwer, M Friese, S Albers, S Gorlatch - International Journal of Parallel …, 2014 - Springer
Algorithmic skeletons simplify software development: they abstract typical patterns of
parallelism and provide their efficient implementations, allowing the application developer to …

Converting data-parallelism to task-parallelism by rewrites: purely functional programs across multiple GPUs

BJ Svensson, M Vollmer, E Holk, TL McDonell… - Proceedings of the 4th …, 2015 - dl.acm.org
High-level domain-specific languages for array processing on the GPU are increasingly
common, but they typically only run on a single GPU. As computational power is distributed …

[PDF][PDF] Area exam: General-purpose performance portable programming models for productive exascale computing

A Johnson - University of Oregon, Eugene, OR, USA. Area Exam …, 2020 - cs.uoregon.edu
Modern supercomputer architectures have grown increasingly complex and diverse since
the end of Moore's law in the mid-2000s, and are far more difficult to program than their …

A fair MAC scheme for EDCA based wireless networks

R He, X Fang - 2009 5th International Conference on Testbeds …, 2009 - ieeexplore.ieee.org
This paper presents a bandwidth occupied time proportion fair MAC algorithm (BOTP-
FMAC) for WLAN and Wireless Mesh Network (WMN). It aims to address the unfairness …