DuctTeip: An efficient programming model for distributed task-based parallel computing

A Zafari, E Larsson, M Tillenius - Parallel Computing, 2019 - Elsevier
Current high-performance computer systems used for scientific computing typically combine
shared memory computational nodes in a distributed memory environment. Extracting high …

CPU+ GPU programming of stencil computations for resource-efficient use of GPU clusters

M Sourouri, J Langguth, F Spiga… - 2015 IEEE 18th …, 2015 - ieeexplore.ieee.org
On modern GPU clusters, the role of the CPUs is often restricted to controlling the GPUs and
handling MPI communication. The unused computing power of the CPUs, however, can be …

dOCAL: high-level distributed programming with OpenCL and CUDA

A Rasch, J Bigge, M Wrodarczyk, R Schulze… - The Journal of …, 2020 - Springer
In the state-of-the-art parallel programming approaches OpenCL and CUDA, so-called host
code is required for program's execution. Efficiently implementing host code is often a …

Automatic annotation of tasks in structured code

P Ramos, G Souza, D Soares, G Araújo… - Proceedings of the 27th …, 2018 - dl.acm.org
This paper describes the design and implementation of a suit of static analyses and code
generation techniques to annotate programs with OpenMP pragmas for task parallelism …

A cloud-unaware programming model for easy development of composite services

E Tejedor, J Ejarque, F Lordan… - 2011 IEEE Third …, 2011 - ieeexplore.ieee.org
Cloud computing is inherently service-oriented: cloud applications are delivered to
consumers as services via the Internet. Therefore, these applications can potentially benefit …

A general and fast distributed system for large-scale dynamic programming applications

C Wang, C Yu, S Tang, J Xiao, J Sun, X Meng - Parallel Computing, 2016 - Elsevier
Dynamic programming is an important technique widely used in many scientific applications.
Due to the massive volume of applications' data in practice, parallel and distributed DP is a …

Orchestral: a lightweight framework for parallel simulations of cell-cell communication

A Coulier, A Hellander - … Conference on e-Science (e-Science), 2018 - ieeexplore.ieee.org
We develop a modeling and simulation framework capable of massively parallel simulation
of multicellular systems with spatially resolved stochastic kinetics in individual cells. By the …

[PDF][PDF] Myrmics: A scalable runtime system for global address spaces

S Lyberis - 2013 - publications.ics.forth.gr
For the first decades of the semiconductor industry, processor chips had a single CPU core.
Processor architecture evolved gradually to include many features that extracted …

Highly scalable eigensolvers for petaflop applications

T Auckenthaler - 2013 - mediatum.ub.tum.de
This thesis presents the development of a new eigensolver for the use in massively parallel
systems. Current implementations lack in both, parallel and sequential efficiency on modern …

Performance evaluation of a high-speed switching system based on the fibre channel standard

A Varma, V Sahai, R Bryant - [1993] Proceedings The 2nd …, 1993 - ieeexplore.ieee.org
The authors present a performance study of a switching system being designed for use in
the high-performance switching system (HPSS) project at the Lawrence Livermore National …