Design, use and evaluation of p-fsefi: A parallel soft error fault injection framework for emulating soft errors in parallel applications

Q Guan, N BeBardeleben, P Wu, S Eidenbenz… - Proceedings of the 9th …, 2016 - dl.acm.org
Q Guan, N BeBardeleben, P Wu, S Eidenbenz, S Blanchard, L Monroe, E Baseman, L Tan
Proceedings of the 9th EAI International Conference on Simulation Tools and …, 2016dl.acm.org
Future exascale application programmers and users have a need to quantity an
application's resilience and vulnerability to soft errors before running their codes on
production supercomputers due to the cost of failures and hazards from silent data
corruption. Barring a deep understanding of the resiliency of a particular application,
vulnerability evaluation is commonly done through fault injection tools at either the software
or hardware level. Hardware fault injection, while most realistic, is relegated to customized …
Future exascale application programmers and users have a need to quantity an application's resilience and vulnerability to soft errors before running their codes on production supercomputers due to the cost of failures and hazards from silent data corruption. Barring a deep understanding of the resiliency of a particular application, vulnerability evaluation is commonly done through fault injection tools at either the software or hardware level. Hardware fault injection, while most realistic, is relegated to customized vendor chips and usually applications cannot be evaluated at scale. Software fault injection can be done more practically and efficiently and is the approach that many researchers use as a reasonable approximation. With a sufficiently sophisticated software fault injection framework, an application can be studied to see how it would handle many of the errors that manifest at the application level. Using such a tool, a developer can progressively improve the resilience at targeted locations they believe are important for their target hardware.
ACM Digital Library
以上显示的是最相近的搜索结果。 查看全部搜索结果