An asynchronous dataflow-driven execution model for distributed accelerator computing

P Salzmann, F Knorr, P Thoman… - 2023 IEEE/ACM …, 2023 - ieeexplore.ieee.org
While domain-specific HPC software packages continue to thrive and are vital to many
scientific communities, a general purpose high-productivity GPU cluster programming model …

SYnergy: Fine-grained Energy-Efficient Heterogeneous Computing for Scalable Energy Saving

K Fan, M D'Antonio, L Carpentieri, B Cosenza… - Proceedings of the …, 2023 - dl.acm.org
Energy-efficient computing uses power management techniques such as frequency scaling
to save energy. Implementing energy-efficient techniques on large-scale computing systems …

FCBench: Cross-Domain Benchmarking of Lossless Compression for Floating-Point Data

X Chen, J Tian, I Beaver, C Freeman, Y Yan… - Proceedings of the …, 2024 - dl.acm.org
While both the database and high-performance computing (HPC) communities utilize
lossless compression methods to minimize floating-point data size, a disconnect persists …

SYCL-bench: a versatile cross-platform benchmark suite for heterogeneous computing

S Lal, A Alpay, P Salzmann, B Cosenza… - Euro-Par 2020: Parallel …, 2020 - Springer
The SYCL standard promises to enable high productivity in heterogeneous programming of
a broad range of parallel devices, including multicore CPUs, GPUs, and FPGAs. Its modern …

Declarative data flow in a graph-based distributed memory runtime system

F Knorr, P Thoman, T Fahringer - International Journal of Parallel …, 2023 - Springer
Runtime systems can significantly reduce the cognitive complexity of scientific applications,
narrowing the gap between systems engineering and domain science in HPC. One of the …

ndzip-gpu: efficient lossless compression of scientific floating-point data on GPUs

F Knorr, P Thoman, T Fahringer - … of the International Conference for High …, 2021 - dl.acm.org
Lossless data compression is a promising software approach for reducing the bandwidth
requirements of scientific applications on accelerator clusters without introducing …

Tunable and portable extreme-scale drug discovery platform at exascale: the lIGATE approach

G Palermo, G Accordi, D Gadioli, E Vitali… - Proceedings of the 20th …, 2023 - dl.acm.org
Today digital revolution is having a dramatic impact on the pharmaceutical industry and the
entire healthcare system. The implementation of machine learning, extreme-scale computer …

The Italian research on HPC key technologies across EuroHPC

M Aldinucci, G Agosta, A Andreini… - Proceedings of the 18th …, 2021 - dl.acm.org
High-Performance Computing (HPC) is one of the strategic priorities for research and
innovation worldwide due to its relevance for industrial and scientific applications. We …

EMPI: enhanced message passing interface in modern c++

MS Beni, L Crisci, B Cosenza - 2023 IEEE/ACM 23rd …, 2023 - ieeexplore.ieee.org
Message Passing Interface (MPI) is a well-known standard for programming distributed and
HPC systems. While the community has been continuously improving MPI to address the …

The celerity high-level api: C++ 20 for accelerator clusters

P Thoman, F Tischler, P Salzmann… - International Journal of …, 2022 - Springer
Providing convenient APIs and notations for data parallelism which remain accessible for
programmers while still providing good performance has been a long-term goal of …