Using simulation to evaluate the performance of resilience strategies at scale S Levy, B Topp, KB Ferreira, D Arnold, T Hoefler, P Widener High Performance Computing Systems. Performance Modeling, Benchmarking and …, 2014 | 40 | 2014 |
Lessons learned from memory errors observed over the lifetime of Cielo S Levy, KB Ferreira, N DeBardeleben, T Siddiqua, V Sridharan, ... SC18: International Conference for High Performance Computing, Networking …, 2018 | 38 | 2018 |
Understanding the effects of communication and coordination on checkpointing at scale KB Ferreira, P Widener, S Levy, D Arnold, T Hoefler SC'14: Proceedings of the International Conference for High Performance …, 2014 | 35 | 2014 |
Understanding performance interference in next-generation HPC systems OH Mondragon, PG Bridges, S Levy, KB Ferreira, P Widener SC'16: Proceedings of the International Conference for High Performance …, 2016 | 31 | 2016 |
Lifetime memory reliability data from the field T Siddiqua, V Sridharan, SE Raasch, N DeBardeleben, KB Ferreira, ... 2017 IEEE International Symposium on Defect and Fault Tolerance in VLSI and …, 2017 | 26 | 2017 |
Faodel: Data management for next-generation application workflows C Ulmer, S Mukherjee, G Templet, S Levy, J Lofstead, P Widener, ... Proceedings of the 9th Workshop on Scientific Cloud Computing, 1-6, 2018 | 21 | 2018 |
Characterizing MPI matching via trace-based simulation KB Ferreira, S Levy, K Pedretti, RE Grant Proceedings of the 24th European MPI Users' Group Meeting, 1-11, 2017 | 21 | 2017 |
Improving dram fault characterization through machine learning E Baseman, N DeBardeleben, K Ferreira, S Levy, S Raasch, V Sridharan, ... 2016 46th Annual IEEE/IFIP International Conference on Dependable Systems …, 2016 | 19 | 2016 |
Exploring the effect of noise on the performance benefit of nonblocking allreduce P Widener, KB Ferreira, S Levy, T Hoefler Proceedings of the 21st European MPI Users' Group Meeting, 77-82, 2014 | 15 | 2014 |
Empress: extensible metadata provider for extreme-scale scientific simulations M Lawson, C Ulmer, S Mukherjee, G Templet, J Lofstead, S Levy, ... Proceedings of the 2nd Joint International Workshop on Parallel Data Storage …, 2017 | 14 | 2017 |
Using unreliable virtual hardware to inject errors in extreme-scale systems S Levy, MGF Dosanjh, PG Bridges, KB Ferreira Proceedings of the 3rd Workshop on Fault-tolerance for HPC at extreme scale …, 2013 | 13 | 2013 |
An examination of the impact of failure distribution on coordinated checkpoint/restart S Levy, KB Ferreira Proceedings of the ACM Workshop on Fault-Tolerance for HPC at Extreme Scale …, 2016 | 10 | 2016 |
Evaluating the feasibility of using memory content similarity to improve system resilience S Levy, PG Bridges, KB Ferreira, AP Thompson, C Trott Proceedings of the 3rd International Workshop on Runtime and Operating …, 2013 | 10 | 2013 |
Hardware MPI message matching: Insights into MPI matching behavior to inform design K Ferreira, RE Grant, MJ Levenhagen, S Levy, T Groves Concurrency and Computation: Practice and Experience 32 (3), e5150, 2020 | 9 | 2020 |
Using simulation to examine the effect of MPI message matching costs on application performance S Levy, KB Ferreira Proceedings of the 25th European MPI Users' Group Meeting, 1-11, 2018 | 9 | 2018 |
Scheduling in-situ analytics in next-generation applications OH Mondragon, PG Bridges, S Levy, KB Ferreira, P Widener 2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid …, 2016 | 9 | 2016 |
Using simulation to evaluate the performance of resilience strategies and process failures SN Levy, BE Topp, DC Arnold, KB Ferreira, P Widener, T Hoefler Sandia National Lab.(SNL-NM), Albuquerque, NM (United States), 2014 | 9 | 2014 |
“Smarter” NICs for faster molecular dynamics: a case study S Karamati, C Hughes, KS Hemmert, RE Grant, WW Schonbein, S Levy, ... 2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS …, 2022 | 8 | 2022 |
On noise and the performance benefit of nonblocking collectives PM Widener, S Levy, KB Ferreira, T Hoefler The International Journal of High Performance Computing Applications 30 (1 …, 2016 | 8 | 2016 |
RaDD runtimes: Radical and different distributed runtimes with smartnics RE Grant, W Schonbein, S Levy 2020 IEEE/ACM Fourth Annual Workshop on Emerging Parallel and Distributed …, 2020 | 7 | 2020 |