Futhark: purely functional GPU-programming with nested parallelism and in-place array updates

T Henriksen, NGW Serup, M Elsman… - Proceedings of the 38th …, 2017 - dl.acm.org
Futhark is a purely functional data-parallel array language that offers a machine-neutral
programming model and an optimising compiler that generates OpenCL code for GPUs …

PTask: operating system abstractions to manage GPUs as compute devices

CJ Rossbach, J Currey, M Silberstein, B Ray… - Proceedings of the …, 2011 - dl.acm.org
We propose a new set of OS abstractions to support GPUs and other accelerator devices as
first class computing resources. These new abstractions, collectively called the PTask API …

Generating performance portable code using rewrite rules: from high-level functional expressions to high-performance OpenCL code

M Steuwer, C Fensch, S Lindley, C Dubach - ACM SIGPLAN Notices, 2015 - dl.acm.org
Computers have become increasingly complex with the emergence of heterogeneous
hardware combining multicore CPUs and GPUs. These parallel systems exhibit tremendous …

Dandelion: a compiler and runtime for heterogeneous systems

CJ Rossbach, Y Yu, J Currey, JP Martin… - Proceedings of the …, 2013 - dl.acm.org
Computer systems increasingly rely on heterogeneity to achieve greater performance,
scalability and energy efficiency. Because heterogeneous systems typically comprise …

Smart multi-task scheduling for OpenCL programs on CPU/GPU heterogeneous platforms

Y Wen, Z Wang, MFP O'boyle - 2014 21st International …, 2014 - ieeexplore.ieee.org
Heterogeneous systems consisting of multiple CPUs and GPUs are increasingly attractive
as platforms for high performance computing. Such platforms are usually programmed using …

Portable mapping of data parallel programs to opencl for heterogeneous systems

D Grewe, Z Wang, MFP O'Boyle - Proceedings of the 2013 …, 2013 - ieeexplore.ieee.org
General purpose GPU based systems are highly attractive as they give potentially massive
performance at little cost. Re-alizing such potential is challenging due to the complexity of …

Transparent CPU-GPU collaboration for data-parallel kernels on heterogeneous systems

J Lee, M Samadi, Y Park… - Proceedings of the 22nd …, 2013 - ieeexplore.ieee.org
Heterogeneous computing on CPUs and GPUs has traditionally used fixed roles for each
device: the GPU handles data parallel work by taking advantage of its massive number of …

Compiling a high-level language for GPUs: (via language support for architectures and compilers)

C Dubach, P Cheng, R Rabbah, DF Bacon… - ACM SIGPLAN …, 2012 - dl.acm.org
Languages such as OpenCL and CUDA offer a standard interface for general-purpose
programming of GPUs. However, with these languages, programmers must explicitly …

Automatic optimization of thread-coarsening for graphics processors

A Magni, C Dubach, M O'Boyle - … of the 23rd international conference on …, 2014 - dl.acm.org
OpenCL has been designed to achieve functional portability across multi-core devices from
different vendors. However, the lack of a single cross-target optimizing compiler severely …

Complexity analysis and algorithm design for reorganizing data to minimize non-coalesced memory accesses on gpu

B Wu, Z Zhao, EZ Zhang, Y Jiang, X Shen - ACM SIGPLAN Notices, 2013 - dl.acm.org
The performance of Graphic Processing Units (GPU) is sensitive to irregular memory
references. Some recent work shows the promise of data reorganization for eliminating non …