[HTML][HTML] A taxonomy of task-based parallel programming technologies for high-performance computing

P Thoman, K Dichev, T Heller, R Iakymchuk… - The Journal of …, 2018 - Springer
Task-based programming models for shared memory—such as Cilk Plus and OpenMP 3—
are well established and documented. However, with the increase in parallel, many-core …

Resiliency in numerical algorithm design for extreme scale simulations

E Agullo, M Altenbernd, H Anzt… - … Journal of High …, 2022 - journals.sagepub.com
This work is based on the seminar titled 'Resiliency in Numerical Algorithm Design for
Extreme Scale Simulations' held March 1–6, 2020, at Schloss Dagstuhl, that was attended …

Studying error propagation on application data structure and hardware

Z Ozturk, HR Topcuoglu, MT Kandemir - The Journal of Supercomputing, 2022 - Springer
As technology scales, transistors become smaller and aggressive power optimization
techniques combined with high operation frequencies and performance-enhancing …

Exploiting asynchrony from exact forward recovery for due in iterative solvers

L Jaulmes, M Casas, M Moretó, E Ayguadé… - Proceedings of the …, 2015 - dl.acm.org
This paper presents a method to protect iterative solvers from Detected and Uncorrected
Errors (DUE) relying on error detection techniques already available in commodity …

Checkpointing workflows for fail-stop errors

L Han, LC Canon, H Casanova… - IEEE Transactions on …, 2018 - ieeexplore.ieee.org
We consider the problem of orchestrating the execution of workflow applications structured
as Directed Acyclic Graphs (DAGs) on parallel computing platforms that are subject to fail …

Unified fault-tolerance framework for hybrid task-parallel message-passing applications

O Subasi, T Martsinkevich… - … Journal of High …, 2018 - journals.sagepub.com
We present a unified fault-tolerance framework for task-parallel message-passing
applications to mitigate transient errors. First, we propose a fault-tolerant message-logging …

Spatial support vector regression to detect silent errors in the exascale era

O Subasi, S Di, L Bautista-Gomez… - 2016 16th IEEE/ACM …, 2016 - ieeexplore.ieee.org
As the exascale era approaches, the increasing capacity of high-performance computing
(HPC) systems with targeted power and energy budget goals introduces significant …

Vits: video tagging system from massive web multimedia collections

D Fernández, D Varas, J Espadaler… - Proceedings of the …, 2017 - openaccess.thecvf.com
The popularization of multimedia content on the Web has arised the need to automatically
understand, index and retrieve it. In this paper we present ViTS, an automatic Video Tagging …

Enabling resilience in asynchronous many-task programming models

SR Paul, A Hayashi, N Slattengren, H Kolla… - Euro-Par 2019: Parallel …, 2019 - Springer
Resilience is an imminent issue for next-generation platforms due to projected increases in
soft/transient failures as part of the inherent trade-offs among performance, energy, and …

Exploring the capabilities of support vector machines in detecting silent data corruptions

O Subasi, S Di, L Bautista-Gomez… - … Informatics and Systems, 2018 - Elsevier
As the exascale era approaches, the increasing capacity of high-performance computing
(HPC) systems with targeted power and energy budget goals introduces significant …