Resiliency in numerical algorithm design for extreme scale simulations

E Agullo, M Altenbernd, H Anzt… - … Journal of High …, 2022 - journals.sagepub.com
This work is based on the seminar titled 'Resiliency in Numerical Algorithm Design for
Extreme Scale Simulations' held March 1–6, 2020, at Schloss Dagstuhl, that was attended …

Algorithm-based fault-tolerant parallel sorting

ET Camargo, EPD Junior - International Journal of Critical …, 2024 - inderscienceonline.com
High performance computing (HPC) systems often require substantial resources, and can
take up to several hours or days to execute. Upon a failure, it is important to loose as little …

An algorithm-based fault tolerance strategy for the bitonic sort parallel algorithm

ET Camargo, EP Duarte - 2021 10th Latin-American …, 2021 - ieeexplore.ieee.org
High Performance Computing (HPC) systems are employed to solve hard problems and rely
on parallel algorithms which present very long execution times-up to several days. These …

Parallelized 0/1 Knapsack Algorithm Optimization in CPU-GPU-Based Heterogeneous System with Algorithm-based Fault Tolerance

MB Abid, S Shaha, U Kabir… - 2024 18th International …, 2024 - ieeexplore.ieee.org
A heterogeneous high-performance computing (HPC) system is an aggregation of CPUs
and GPUs through high-speed interconnection. As graphics processing units (GPUs) can …

Resiliency in Numerical Algorithm Design for Extreme Scale Simulations (Dagstuhl Seminar 20101)

L Giraud, U Rüde, L Stals - 2020 - drops.dagstuhl.de
This work is based on the seminar titled" Resiliency in Numerical Algorithm Design for
Extreme Scale Simulations" held March 1-6, 2020 at Schloss Dagstuhl, that was attended by …