hwloc: A generic framework for managing hardware affinities in HPC applications

F Broquedis, J Clet-Ortega, S Moreaud… - 2010 18th Euromicro …, 2010 - ieeexplore.ieee.org
The increasing numbers of cores, shared caches and memory nodes within machines
introduces a complex hardware topology. High-performance computing applications now …

Optimizing MPI communication on multi-GPU systems using CUDA inter-process communication

S Potluri, H Wang, D Bureddy, AK Singh… - 2012 IEEE 26th …, 2012 - ieeexplore.ieee.org
Many modern clusters are being equipped with multiple GPUs per node to achieve better
compute density and power efficiency. However, moving data in/out of GPUs continues to …

On the Caching Schemes to Speed Up Program Reduction

Y Tian, X Zhang, Y Dong, Z Xu, M Zhang… - ACM Transactions on …, 2023 - dl.acm.org
Program reduction is a highly practical, widely demanded technique to help debug
language tools, such as compilers, interpreters and debuggers. Given a program P that …

KNEM: A generic and scalable kernel-assisted intra-node MPI communication framework

B Goglin, S Moreaud - Journal of Parallel and Distributed Computing, 2013 - Elsevier
The multiplication of cores in today's architectures raises the importance of intra-node
communication in modern clusters and their impact on the overall parallel application …

High performance interconnect network for Tianhe system

XK Liao, ZB Pang, KF Wang, YT Lu, M Xie, J Xia… - Journal of Computer …, 2015 - Springer
In this paper, we present the Tianhe-2 interconnect network and message passing services.
We describe the architecture of the router and network interface chips, and highlight a set of …

FlexIO: I/O middleware for location-flexible scientific data analytics

F Zheng, H Zou, G Eisenhauer… - 2013 IEEE 27th …, 2013 - ieeexplore.ieee.org
Increasingly severe I/O bottlenecks on High-End Computing machines are prompting
scientists to process simulation output data online while simulations are running and before …

Framework for scalable intra-node collective operations using shared memory

S Jain, R Kaleem, MG Balmana… - … Conference for High …, 2018 - ieeexplore.ieee.org
Collective operations are used in MPI programs to express common communication
patterns, collective computations, or synchronization. In many collectives, such as …

[PDF][PDF] Popcorn: a replicated-kernel OS based on Linux

A Barbalace, B Ravindran, D Katz - Proceedings of the Linux Symposium …, 2014 - kernel.org
In recent years, the number of CPUs per platform has continuously increased, affecting
almost all segments of the computer market. Because of this trend, many researchers have …

The TH Express high performance interconnect networks

Z Pang, M Xie, J Zhang, Y Zheng, G Wang… - Frontiers of Computer …, 2014 - Springer
Interconnection network plays an important role in scalable high performance computer
(HPC) systems. The TH Express-2 interconnect has been used in MilkyWay-2 system to …

Kernel assisted collective intra-node mpi communication among multi-core and many-core cpus

T Ma, G Bosilca, A Bouteiller, B Goglin… - 2011 International …, 2011 - ieeexplore.ieee.org
Shared memory is among the most common approaches to implementing message passing
within multicorenodes. However, current shared memory techniques donot scale with …