Early evaluation of multicore systems soft error reliability using virtual platforms

FR da Rosa, R Reis, L Ost - 2018 2nd Conference on PhD …, 2018 - ieeexplore.ieee.org
The increasing computing capacity of multicore components like processors and graphics
processing units (GPUs) offer new opportunities for embedded and high-performance …

Evaluation of multicore systems soft error reliability using virtual platforms

F Rosa, L Ost, R Reis, S Davidmann… - 2017 15th IEEE …, 2017 - ieeexplore.ieee.org
Reliability is rapidly emerging as a major design metric in both embedded and high
performance computing (HPC) domains. Such systems are integrating modern multicore …

Characterizing the soft error vulnerability of multicores running multithreaded applications

N Soundararajan, A Sivasubramaniam… - ACM SIGMETRICS …, 2010 - dl.acm.org
Multicores have become the platform of choice across all market segments. Cost-effective
protection against soft errors is important in these environments, due to the need to move to …

Evaluation of compilers effects on OpenMP soft error resiliency

J Gava, V Bandeira, R Reis, L Ost - 2019 IEEE Computer …, 2019 - ieeexplore.ieee.org
Software engineers are using different compilers and parallel programming models (eg,
Pthreads, OpenMP) to take the best performance offered by multicore systems. Both …

gem5-FIM: a flexible and scalable multicore soft error assessment framework to early reliability design space explorations

FR Da Rosa, R Reis, L Ost - 2018 IEEE 9th Latin American …, 2018 - ieeexplore.ieee.org
Increasing chip power densities allied to the continuous technology shrink are making
emerging multiprocessor embedded systems more vulnerable to radiation-induced transient …

Exploring the impact of soft errors on NoC-based multiprocessor systems

FT Bortolon, G Abich, S Bampi, R Reis… - … on Circuits and …, 2018 - ieeexplore.ieee.org
Software reliability is an essential design metric in emerging large-scale multiprocessor
embedded systems. Designers should identify soft error susceptibility of multiple …

Measuring and understanding extreme-scale application resilience: A field study of 5,000,000 HPC application runs

C Di Martino, W Kramer, Z Kalbarczyk… - 2015 45th Annual IEEE …, 2015 - ieeexplore.ieee.org
This paper presents an in-depth characterization of the resiliency of more than 5 million HPC
application runs completed during the first 518 production days of Blue Waters, a 13.1 …

Reliability evaluation of mixed-precision architectures

FF dos Santos, C Lunardi, D Oliveira… - … Symposium on High …, 2019 - ieeexplore.ieee.org
Novel computing architectures offer the possibility to execute float point operations with
different precisions. The execution of reduced precision operations, when acceptable for …

Efficient error-detection and recovery mechanisms for reliability and resiliency of multicores

S Kundu, O Khan - 2016 29th International Conference on VLSI …, 2016 - ieeexplore.ieee.org
With increasing density of power, traditional frequency scaling of processors came to an
end. The power wall forced the industry to seek performance from parallel processing …

[图书][B] Soft Error Reliability Using Virtual Platforms: Early Evaluation of Multicore Systems

FR da Rosa, L Ost, R Reis - 2020 - books.google.com
This book describes the benefits and drawbacks inherent in the use of virtual platforms (VPs)
to perform fast and early soft error assessment of multicore systems. The authors show that …