Optimising purely functional GPU programs

TL McDonell, MMT Chakravarty, G Keller… - ACM SIGPLAN …, 2013 - dl.acm.org
Purely functional, embedded array programs are a good match for SIMD hardware, such as
GPUs. However, the naive compilation of such programs quickly leads to both code …

Finpar: A parallel financial benchmark

C Andreetta, V Bégot, J Berthold, M Elsman… - ACM Transactions on …, 2016 - dl.acm.org
Commodity many-core hardware is now mainstream, but parallel programming models are
still lagging behind in efficiently utilizing the application parallelism. There are (at least) two …

Streaming irregular arrays

R Clifton-Everest, TL McDonell… - Proceedings of the 10th …, 2017 - dl.acm.org
Previous work has demonstrated that it is possible to generate efficient and highly parallel
code for multicore CPUs and GPUs from combinator-based array languages for a range of …

Functional array streams

FM Madsen, R Clifton-Everest… - Proceedings of the 4th …, 2015 - dl.acm.org
Regular array languages for high performance computing based on aggregate operations
provide a convenient parallel programming model, which enables the generation of efficient …

[PDF][PDF] Nessie: A nesl to cuda compiler

J Reppy, N Sandler - … for Parallel Computing Workshop (CPC'15) …, 2015 - shonan.nii.ac.jp
▶ NESL was designed for bulk-data processing on wide-vector machines (SIMD)▶
Potentially a good fit for GPU computation▶ First try [Bergstrom & Reppy'12] demonstrated …

Battling memory requirements of array programming through streaming

MRB Kristensen, J Avery, T Blum, SAF Lund… - … Computing: ISC High …, 2016 - Springer
A barrier to efficient array programming, for example in Python/NumPy, is that algorithms
written as pure array operations completely without loops, while most efficient on small input …

Streaming nested data parallelism on multicores

FM Madsen, A Filinski - Proceedings of the 5th International Workshop …, 2016 - dl.acm.org
The paradigm of nested data parallelism (NDP) allows a variety of semi-regular computation
tasks to be mapped onto SIMD-style hardware, including GPUs and vector units. However …

[PDF][PDF] Nessie: A new NESL compiler

N Sandler - BA Honors Thesis, Department of Computer …, 2014 - nessie.cs.uchicago.edu
Abstract Graphics Processing Units (GPUs) have hundreds to thousands of cores, promising
extremely high performance for data-parallel applications. However, most general-purpose …

[PDF][PDF] Streaming for Functional Data-Parallel Languages

FM Madsen - 2016 - hiperfit.dk
In this thesis, we investigate streaming as a general solution to the space inefficiency
commonly found in functional data-parallel programming languages. The data-parallel …

[PDF][PDF] FinPar: A Parallel Financial Benchmark

J BERTHOLD, M ELSMAN, F HENGLEIN… - academia.edu
With the mainstream emergence of many-core architectures, such as GPGPUs, massive
parallelism has become a focus area of industrial application development. However …