Efficient design for MPI asynchronous progress without dedicated resources

A Ruhela, H Subramoni, S Chakraborty, M Bayatpour… - Parallel Computing, 2019 - Elsevier
The overlap of computation and communication is critical for good performance of many
HPC applications. State-of-the-art designs for the asynchronous progress require specially …

Efficient asynchronous communication progress for MPI without dedicated resources

A Ruhela, H Subramoni, S Chakraborty… - Proceedings of the 25th …, 2018 - dl.acm.org
The overlap of computation and communication is critical for good performance of many
HPC applications. State-of-the-art designs for the asynchronous progress require specially …

OpenACC profiling support for clang and LLVM using clacc and TAU

C Coti, JE Denny, K Huck, S Lee… - 2020 IEEE/ACM …, 2020 - ieeexplore.ieee.org
Since its launch in 2010, OpenACC has evolved into one of the most widely used portable
programming models for accelerators on HPC systems today. Clacc is a project funded by …

Enabling callback-driven runtime introspection via MPI_T

MA Hermanns, NT Hjlem, M Knobloch… - Proceedings of the 25th …, 2018 - dl.acm.org
Understanding the behavior of parallel applications that use the Message Passing Interface
(MPI) is critical for optimizing communication performance. Performance tools for MPI …

Locating and categorizing inefficient communication patterns in HPC systems using inter-process communication traces

L Alawneh, A Hamou-Lhadj - Journal of Systems and Software, 2022 - Elsevier
Abstract High Performance Computing (HPC) systems are used in a variety of industrial and
research sectors to solve complex problems that require powerful computing platforms. For …

Machine-agnostic and Communication-aware Designs for MPI on Emerging Architectures

JM Hashmi, S Xu, B Ramesh… - 2020 IEEE …, 2020 - ieeexplore.ieee.org
Modern multi-/many-cores offer higher core-density, hardware multi-threading, deeper
memory hierarchies, and diverse architectural capabilities. While emerging cloud-based …

SYMBIOSYS: A methodology for performance analysis of composable hpc data services

S Ramesh, AD Malony, P Carns… - 2021 IEEE …, 2021 - ieeexplore.ieee.org
Microservices are a powerful new way of building, customizing, and deploying distributed
services owing to their flexibility and maintainability. Several large-scale distributed …

STaKTAU: profiling HPC applications' operating system usage

C Coti, K Huck, AD Malony - arXiv preprint arXiv:2304.11205, 2023 - arxiv.org
This paper presents a approach for measuring the time spent by HPC applications in the
operating system's kernel. We use the SystemTap interface to insert timers before and after …

Multi-level performance instrumentation for Kokkos applications using TAU

S Shende, N Chaimov, A Malony… - 2019 IEEE/ACM …, 2019 - ieeexplore.ieee.org
The TAU Performance System® provides a multi-level instrumentation strategy for
instrumentation of Kokkos applications. Kokkos provides a performance portable API for …

Generating and Scaling a Multi-Language Test-Suite for MPI

J Adam, JB Besnard, P Canat, H Taboada… - Proceedings of the 30th …, 2023 - dl.acm.org
High-Performance Computing (HPC) is currently facing significant challenges. The
hardware pressure has become increasingly difficult to manage due to the lack of parallel …