KNEM: A generic and scalable kernel-assisted intra-node MPI communication framework

B Goglin, S Moreaud - Journal of Parallel and Distributed Computing, 2013 - Elsevier
The multiplication of cores in today's architectures raises the importance of intra-node
communication in modern clusters and their impact on the overall parallel application …

Kernel assisted collective intra-node mpi communication among multi-core and many-core cpus

T Ma, G Bosilca, A Bouteiller, B Goglin… - 2011 International …, 2011 - ieeexplore.ieee.org
Shared memory is among the most common approaches to implementing message passing
within multicorenodes. However, current shared memory techniques donot scale with …

Contention-aware kernel-assisted MPI collectives for multi-/many-core systems

S Chakraborty, H Subramoni… - 2017 IEEE International …, 2017 - ieeexplore.ieee.org
Multi-/many-core CPU based architectures are seeing widespread adoption due to their
unprecedented compute performance in a small power envelope. With the increasingly …

Gait analysis for human identification in frequency domain

S Yu, L Wang, W Hu, T Tan - … on Image and Graphics (ICIG'04), 2004 - ieeexplore.ieee.org
In this paper, we analyze the spatio-temporal human characteristic of moving silhouettes in
frequency domain, and find key Fourier descriptors that have better discriminatory capability …

Process distance-aware adaptive MPI collective communications

T Ma, T Herault, G Bosilca… - 2011 IEEE International …, 2011 - ieeexplore.ieee.org
Message Passing Interface (MPI) implementations provide a great flexibility to allow users to
arbitrarily bind processes to computing cores to fully exploit clusters of multicore/many-core …

XCluster synopses for structured XML content

N Polyzotis, M Garofalakis - 22nd International Conference on …, 2006 - ieeexplore.ieee.org
We tackle the difficult problem of summarizing the path/branching structure and value
content of an XML database that comprises both numeric and textual values. We introduce a …

A ugni-based asynchronous message-driven runtime system for cray supercomputers with gemini interconnect

Y Sun, G Zheng, LV Kalé, TR Jones… - 2012 IEEE 26th …, 2012 - ieeexplore.ieee.org
Gemini, the network for the new Cray XE/XK systems, features low latency, high bandwidth
and strong scalability. Its hardware support for remote direct memory access enables …

Optimizing point‐to‐point communication between adaptive MPI endpoints in shared memory

S White, LV Kale - Concurrency and Computation: Practice and …, 2020 - Wiley Online Library
Adaptive MPI is an implementation of the MPI standard that supports the virtualization of
ranks as user‐level threads, rather than OS processes. In this work, we optimize the …

Cooperative rendezvous protocols for improved performance and overlap

S Chakraborty, M Bayatpour, J Hashmi… - … Conference for High …, 2018 - ieeexplore.ieee.org
With the emergence of larger multi-/many-core clusters and new areas of HPC applications,
performance of large message communication is becoming more important. MPI libraries …

DMA-assisted, intranode communication in GPU accelerated systems

F Ji, AM Aji, J Dinan, D Buntinas… - 2012 IEEE 14th …, 2012 - ieeexplore.ieee.org
Accelerator awareness has become a pressing issue in data movement models, such as
MPI, because of the rapid deployment of systems that utilize accelerators. In our previous …