Optimizing MPI Communication within large Multicore nodes with Kernel assistance

B Goglin, S Moreaud - Journal of Parallel and Distributed Computing, 2013 - Elsevier

The multiplication of cores in today's architectures raises the importance of intra-node
communication in modern clusters and their impact on the overall parallel application …

被引用次数：110 相关文章所有 12 个版本

[PDF] hal.science

Kernel assisted collective intra-node mpi communication among multi-core and many-core cpus

T Ma, G Bosilca, A Bouteiller, B Goglin… - 2011 International …, 2011 - ieeexplore.ieee.org

Shared memory is among the most common approaches to implementing message passing
within multicorenodes. However, current shared memory techniques donot scale with …

被引用次数：61 相关文章所有 10 个版本

[PDF] souravc.com

Contention-aware kernel-assisted MPI collectives for multi-/many-core systems

S Chakraborty, H Subramoni… - 2017 IEEE International …, 2017 - ieeexplore.ieee.org

Multi-/many-core CPU based architectures are seeing widespread adoption due to their
unprecedented compute performance in a small power envelope. With the increasingly …

被引用次数：32 相关文章所有 4 个版本

[PDF] archive.org

Gait analysis for human identification in frequency domain

S Yu, L Wang, W Hu, T Tan - … on Image and Graphics (ICIG'04), 2004 - ieeexplore.ieee.org

In this paper, we analyze the spatio-temporal human characteristic of moving silhouettes in
frequency domain, and find key Fourier descriptors that have better discriminatory capability …

被引用次数：75 相关文章所有 4 个版本

[PDF] utk.edu

Process distance-aware adaptive MPI collective communications

T Ma, T Herault, G Bosilca… - 2011 IEEE International …, 2011 - ieeexplore.ieee.org

Message Passing Interface (MPI) implementations provide a great flexibility to allow users to
arbitrarily bind processes to computing cores to fully exploit clusters of multicore/many-core …

被引用次数：53 相关文章所有 7 个版本

[PDF] github.io

XCluster synopses for structured XML content

N Polyzotis, M Garofalakis - 22nd International Conference on …, 2006 - ieeexplore.ieee.org

We tackle the difficult problem of summarizing the path/branching structure and value
content of an XML database that comprises both numeric and textual values. We introduce a …

被引用次数：69 相关文章所有 10 个版本

[PDF] psu.edu

A ugni-based asynchronous message-driven runtime system for cray supercomputers with gemini interconnect

Y Sun, G Zheng, LV Kalé, TR Jones… - 2012 IEEE 26th …, 2012 - ieeexplore.ieee.org

Gemini, the network for the new Cray XE/XK systems, features low latency, high bandwidth
and strong scalability. Its hardware support for remote direct memory access enables …

被引用次数：38 相关文章所有 16 个版本

[PDF] wiley.com

Optimizing point‐to‐point communication between adaptive MPI endpoints in shared memory

S White, LV Kale - Concurrency and Computation: Practice and …, 2020 - Wiley Online Library

Adaptive MPI is an implementation of the MPI standard that supports the virtualization of
ranks as user‐level threads, rather than OS processes. In this work, we optimize the …

被引用次数：13 相关文章所有 3 个版本

[PDF] souravc.com

Cooperative rendezvous protocols for improved performance and overlap

S Chakraborty, M Bayatpour, J Hashmi… - … Conference for High …, 2018 - ieeexplore.ieee.org

With the emergence of larger multi-/many-core clusters and new areas of HPC applications,
performance of large message communication is becoming more important. MPI libraries …

被引用次数：10 相关文章所有 5 个版本

[PDF] github.io

DMA-assisted, intranode communication in GPU accelerated systems

F Ji, AM Aji, J Dinan, D Buntinas… - 2012 IEEE 14th …, 2012 - ieeexplore.ieee.org

Accelerator awareness has become a pressing issue in data movement models, such as
MPI, because of the rapid deployment of systems that utilize accelerators. In our previous …

被引用次数：19 相关文章所有 14 个版本

高级搜索

QQ 群