Unleashing fine-grained parallelism on embedded many-core accelerators with lightweight OpenMP tasking

G Tagliavini, D Cesarini… - IEEE Transactions on …, 2018 - ieeexplore.ieee.org
In recent years, programmable many-core accelerators (PMCAs) have been introduced in
embedded systems to satisfy stringent performance/Watt requirements. This has increased …

libKOMP, an Efficient OpenMP Runtime System for Both Fork-Join and Data Flow Paradigms

F Broquedis, T Gautier, V Danjean - … on OpenMP, IWOMP 2012, Rome, Italy …, 2012 - Springer
To efficiently exploit high performance computing platforms, applications currently have to
express more and more finer-grain parallelism. The OpenMP standard allows programmers …

Enabling fine-grained OpenMP tasking on tightly-coupled shared memory clusters

P Burgio, G Tagliavini, A Marongiu… - … Design, Automation & …, 2013 - ieeexplore.ieee.org
Cluster-based architectures are increasingly being adopted to design embedded many-
cores. These platforms can deliver very high peak performance within a contained power …

Deploying OpenMP on an embedded multicore accelerator

SN Agathos, VV Dimakopoulos… - 2013 International …, 2013 - ieeexplore.ieee.org
Multiprocessor systems-on-chip (MPSoC) are now considered first-class citizens both in the
embedded systems and in the high-performance computing arenas, in the form of …

A comparative performance study of common and popular task‐centric programming frameworks

A Podobas, M Brorsson… - … and Computation: Practice …, 2015 - Wiley Online Library
Programmers today face a bewildering array of parallel programming models and tools,
making it difficult to choose an appropriate one for each application. An increasingly popular …

Speeding up OpenMP tasking

SN Agathos, ND Kallimanis… - European Conference on …, 2012 - Springer
In this work we present a highly efficient implementation of OpenMP tasks. It is based on a
runtime infrastructure architected for data locality, a crucial prerequisite for exploiting the …

A code generator for energy-efficient wavefront parallelization of uniform dependence computations

Y Zou, S Rajopadhye - IEEE Transactions on Parallel and …, 2017 - ieeexplore.ieee.org
Energy is now critical in all aspects of computing. We address a class of programs that
includes so-called “stencil computations.” We address energy optimization of such …

Task-based execution of nested openmp loops

SN Agathos, PE Hadjidoukas… - International Workshop on …, 2012 - Springer
In this work we propose a novel technique to reduce the overheads related to nested
parallel loops in OpenMP programs. In particular we show that in many cases it is possible …

A quantitative evaluation of popular task-centric programming models and libraries

A Podobas, M Brorsson, KF Faxén - 2012 - diva-portal.org
Programmers today face a bewildering array of parallel programming models and tools,
making it difficult to choose an appropriate one for each application. The present study …

Tightly-coupled hardware support to dynamic parallelism acceleration in embedded shared memory clusters

P Burgio, G Tagliavini, F Conti… - … Design, Automation & …, 2014 - ieeexplore.ieee.org
Modern designs for embedded systems are increasingly embracing cluster-based
architectures, where small sets of cores communicate through tightly-coupled shared …