MPI performance analysis tools on Blue Gene/L

NR Tallent, L Adhianto… - SC'10: Proceedings of …, 2010 - ieeexplore.ieee.org

Applications must scale well to make efficient use of today's class of petascale computers,
which contain hundreds of thousands of processor cores. Inefficiencies that do not even …

被引用次数：98 相关文章所有 11 个版本

[PDF] core.ac.uk

Visualizing network traffic to understand the performance of massively parallel simulations

AG Landge, JA Levine, A Bhatele… - … on Visualization and …, 2012 - ieeexplore.ieee.org

The performance of massively parallel applications is often heavily impacted by the cost of
communication among compute nodes. However, determining how to best use the network …

被引用次数：79 相关文章所有 25 个版本

[PDF] arxiv.org

Fliptracker: Understanding natural error resilience in hpc applications

L Guo, D Li, I Laguna, M Schulz - … : International Conference for …, 2018 - ieeexplore.ieee.org

As high-performance computing systems scale in size and computational power, the danger
of silent errors, ie, errors that can bypass hardware detection mechanisms and impact …

被引用次数：33 相关文章所有 9 个版本

[PDF] eudl.eu

A framework for end-to-end simulation of high-performance computing systems

WE Denzel, J Li, P Walker, Y Jin - Simulation, 2010 - journals.sagepub.com

We present an end-to-end simulation framework that is capable of simulating High-
Performance Computing (HPC) systems with hundreds of thousands of interconnected …

被引用次数：83 相关文章所有 19 个版本

[PDF] acm.org

Optimal scheduling of in-situ analysis for large-scale scientific simulations

P Malakar, V Vishwanath, T Munson, C Knight… - Proceedings of the …, 2015 - dl.acm.org

Today's leadership computing facilities have enabled the execution of transformative
simulations at unprecedented scales. However, analyzing the huge amount of output from …

被引用次数：42 相关文章所有 3 个版本

[PDF] osti.gov

BeeSwarm: enabling parallel scaling performance measurement in continuous integration for HPC applications

J Tronge, J Chen, P Grubel, T Randles… - 2021 36th IEEE/ACM …, 2021 - ieeexplore.ieee.org

Testing is one of the most important steps in software development–it ensures the quality of
software. Continuous Integration (CI) is a widely used testing standard that can report …

被引用次数：9 相关文章所有 5 个版本

[PDF] researchgate.net

Scalable fine-grained call path tracing

NR Tallent, J Mellor-Crummey, M Franco… - Proceedings of the …, 2011 - dl.acm.org

Applications must scale well to make efficient use of even medium-scale parallel systems.
Because scaling problems are often difficult to diagnose, there is a critical need for scalable …

被引用次数：51 相关文章所有 7 个版本

[PDF] pdx.edu

Evaluating similarity-based trace reduction techniques for scalable performance analysis

K Mohror, KL Karavanic - Proceedings of the conference on high …, 2009 - dl.acm.org

Event traces are required to correctly diagnose a number of performance problems that arise
on today's highly parallel systems. Unfortunately, the collection of event traces can produce …

被引用次数：52 相关文章所有 11 个版本

[PDF] osti.gov

Lessons learned at 208k: towards debugging millions of cores

GL Lee, DH Ahn, DC Arnold… - SC'08: Proceedings …, 2008 - ieeexplore.ieee.org

Petascale systems will present several new challenges to performance and correctness
tools. Such machines may contain millions of cores, requiring that tools use scalable data …

被引用次数：65 相关文章所有 20 个版本

[PDF] iupui.edu

A visual analytics system for optimizing communications in massively parallel applications

T Fujiwara, P Malakar, K Reda… - … IEEE Conference on …, 2017 - ieeexplore.ieee.org

Current and future supercomputers have tens of thousands of compute nodes
interconnected with high-dimensional networks and complex network topologies for …

被引用次数：22 相关文章所有 11 个版本

高级搜索

QQ 群