[HTML][HTML] A survey on hardware accelerators: Taxonomy, trends, challenges, and perspectives

B Peccerillo, M Mannino, A Mondelli… - Journal of Systems …, 2022 - Elsevier
In recent years, the limits of the multicore approach emerged in the so-called “dark silicon”
issue and diminishing returns of an ever-increasing core count. Hardware manufacturers …

Dandelion: a compiler and runtime for heterogeneous systems

CJ Rossbach, Y Yu, J Currey, JP Martin… - Proceedings of the …, 2013 - dl.acm.org
Computer systems increasingly rely on heterogeneity to achieve greater performance,
scalability and energy efficiency. Because heterogeneous systems typically comprise …

Productive programming of GPU clusters with OmpSs

J Bueno, J Planas, A Duran, RM Badia… - 2012 IEEE 26th …, 2012 - ieeexplore.ieee.org
Clusters of GPUs are emerging as a new computational scenario. Programming them
requires the use of hybrid models that increase the complexity of the applications, reducing …

A batch system with efficient adaptive scheduling for malleable and evolving applications

S Prabhakaran, M Neumann, S Rinke… - 2015 IEEE …, 2015 - ieeexplore.ieee.org
The throughput of supercomputers depends not only on efficient job scheduling but also on
the type of jobs that form the workload. Malleable jobs are most favourable for a cluster as …

A survey on malleability solutions for high-performance distributed computing

JI Aliaga, M Castillo, S Iserte, I Martín-Álvarez… - Applied Sciences, 2022 - mdpi.com
Maintaining a high rate of productivity, in terms of completed jobs per unit of time, in High-
Performance Computing (HPC) facilities is a cornerstone in the next generation of exascale …

Improving the integration of task nesting and dependencies in OpenMP

JM Perez, V Beltran, J Labarta… - 2017 IEEE International …, 2017 - ieeexplore.ieee.org
The tasking model of OpenMP 4.0 supports both nesting and the definition of dependences
between sibling tasks. A natural way to parallelize many codes with tasks is to first taskify the …

Implementing OmpSs support for regions of data in architectures with multiple address spaces

J Bueno, X Martorell, RM Badia, E Ayguadé… - Proceedings of the 27th …, 2013 - dl.acm.org
The need for features for managing complex data accesses in modern programming models
has increased due to the emerging hardware architectures. HPC hardware has moved …

Optimizing the LINPACK algorithm for large-scale PCIe-based CPU-GPU heterogeneous systems

G Tan, C Shui, Y Wang, X Yu… - IEEE Transactions on …, 2021 - ieeexplore.ieee.org
There is a widening gap between GPU and other components (CPU, PCIe bus and
communication network) in heterogeneous parallel system. The gap forces us to orchestrate …

Task-based programming with ompss and its application

A Fernández, V Beltran, X Martorell, RM Badia… - Euro-Par 2014: Parallel …, 2014 - Springer
OmpSs is a task-based programming model that aims to provide portability and flexibility for
sequential codes while the performance is achieved by the dynamic exploitation of the …

Axiom: A scalable, efficient and reconfigurable embedded platform

R Giorgi, M Procaccini, F Khalili - 2019 Design, Automation & …, 2019 - ieeexplore.ieee.org
Cyber-Physical Systems (CPSs) are becoming widely used in every application that requires
interaction between humans and the physical environment. People expect this interaction to …