Exploiting hierarchy in parallel computer networks to optimize collective operation performance

NT Karonis, BR De Supinski, I Foster… - Proceedings 14th …, 2000 - ieeexplore.ieee.org
The efficient implementation of collective communication operations has received much
attention. Initial efforts modeled network communication and produced" optimal" trees based …

Automatically tuned collective communications

SS Vadhiyar, GE Fagg… - SC'00: Proceedings of the …, 2000 - ieeexplore.ieee.org
The performance of the MPI's collective communications is critical in most MPI-based
applications. A general algorithm for a given collective communication operation may not …

MPI collectives on modern multicore clusters: Performance optimizations and communication characteristics

AR Mamidala, R Kumar, D De… - 2008 Eighth IEEE …, 2008 - ieeexplore.ieee.org
The advances in multicore technology and modern interconnects is rapidly accelerating the
number of cores deployed in today's commodity clusters. A majority of parallel applications …

Bandwidth-efficient collective communication for clustered wide area systems

T Kielmann, HE Bal, S Gorlatch - … 14th International Parallel …, 2000 - ieeexplore.ieee.org
Metacomputing infrastructures couple multiple clusters (or MPPs) via wide-area networks. A
major problem in programming parallel applications for such platforms is their hierarchical …

Implementation and performance analysis of non-blocking collective operations for MPI

T Hoefler, A Lumsdaine, W Rehm - Proceedings of the 2007 ACM/IEEE …, 2007 - dl.acm.org
Collective operations and non-blocking point-to-point operations have always been part of
MPI. Although non-blocking collective operations are an obvious extension to MPI, there …

Efficient collective communication on heterogeneous networks of workstations

M Banikazemi, V Moorthy… - … Conference on Parallel …, 1998 - ieeexplore.ieee.org
Networks of Workstations (NOW) have become an attractive alternative platform for high
performance computing. Due to the commodity nature of workstations and interconnects and …

MPICH/MADIII: a cluster of clusters-enabled MPI implementation

G Mercier, O Aumage - Third IEEE International Symposium on Cluster …, 2003 - hal.science
This paper presents an MPI implementation that allows an easy and efficient use of the
interconnection of several clusters, of potentially heterogeneous nature (as far as the …

HierKNEM: An adaptive framework for kernel-assisted and topology-aware collective communications on many-core clusters

T Ma, G Bosilca, A Bouteiller… - 2012 IEEE 26th …, 2012 - ieeexplore.ieee.org
Multicore Clusters, which have become the most prominent form of High Performance
Computing (HPC) systems, challenge the performance of MPI applications with non uniform …

Hierarchical collectives in MPICH2

H Zhu, D Goodell, W Gropp, R Thakur - Recent Advances in Parallel …, 2009 - Springer
Most parallel systems on which MPI is used are now hierarchical, such as systems with SMP
nodes. Many papers have shown algorithms that exploit shared memory to optimize …

Design and evaluation of Nemesis, a scalable, low-latency, message-passing communication subsystem

D Buntinas, G Mercier, W Gropp - Sixth IEEE International …, 2006 - ieeexplore.ieee.org
This paper presents a new low-level communication subsystem called Nemesis. Nemesis
has been designed and implemented to be scalable and efficient both in the intranode …