Hand: A hybrid approach to accelerate non-contiguous data movement using mpi datatypes on...

R Shi, Y Gan, Y Wang - 2018 IEEE 26th international …, 2018 - ieeexplore.ieee.org

Testing a scalability bottleneck requires a large system to generate sufficient load, which is
usually not accessible to researchers. To address this problem, this paper extrapolates the …

被引用次数：44 相关文章所有 4 个版本

Designing efficient small message transfer mechanism for inter-node MPI communication on InfiniBand GPU clusters

R Shi, S Potluri, K Hamidouche… - … Conference on High …, 2014 - ieeexplore.ieee.org

Increasing number of MPI applications are being ported to take advantage of the compute
power offered by GPUs. Data movement on GPU clusters continues to be the major …

被引用次数：59 相关文章所有 3 个版本

[PDF] archive.org

[PDF][PDF] Software Aging and Multifractality of Memory Resources.

M Shereshevsky, J Crowell, B Cukic, V Gandikota… - DSN, 2003 - scholar.archive.org

We investigate the dynamics of monitored memory resource utilizations in an operating
system under stress using quantitative methods of fractal analysis. In the experiments, we …

被引用次数：97 相关文章所有 5 个版本

[PDF] nsf.gov

Automatic irregularity-aware fine-grained workload partitioning on integrated architectures

F Zhang, J Zhai, B Wu, B He, W Chen… - IEEE Transactions on …, 2019 - ieeexplore.ieee.org

The integrated architecture that features both CPU and GPU on the same die is an emerging
and promising architecture for fine-grained CPU-GPU collaboration. However, the …

被引用次数：27 相关文章所有 6 个版本

[PDF] sciencedirect.com

FALCON-X: Zero-copy MPI derived datatype processing on modern CPU and GPU architectures

JM Hashmi, CH Chu, S Chakraborty… - Journal of Parallel and …, 2020 - Elsevier

This paper addresses the challenges of MPI derived datatype processing and proposes
FALCON-X—A Fast and Low-overhead Communication framework for optimized zero-copy …

被引用次数：16 相关文章所有 2 个版本

High performance MPI datatype support with user-mode memory registration: Challenges, designs, and benefits

M Li, H Subramoni, K Hamidouche… - … on Cluster Computing, 2015 - ieeexplore.ieee.org

Noncontiguous data communication has been heavily adopted in scientific applications,
especially for those written with MPI. Common strategies to handle noncontiguous data, like …

被引用次数：23 相关文章所有 5 个版本

[PDF] nsf.gov

Dynamic kernel fusion for bulk non-contiguous data transfer on GPU clusters

CH Chu, KS Khorassani, Q Zhou… - 2020 IEEE …, 2020 - ieeexplore.ieee.org

In the last decade, many scientific applications have been significantly accelerated by large-
scale GPU systems. However, the movement of non-contiguous GPU-resident data is one of …

被引用次数：11 相关文章所有 3 个版本

Distributed join algorithms on multi-CPU clusters with GPUDirect RDMA

C Guo, H Chen, F Zhang, C Li - … of the 48th International Conference on …, 2019 - dl.acm.org

In data management systems, query processing on GPUs or distributed clusters have
proven to be an effective method for high efficiency. However, the high PCIe data transfer …

被引用次数：11 相关文章

High-performance adaptive MPI derived datatype communication for modern Multi-GPU systems

CH Chu, JM Hashmi, KS Khorassani… - 2019 IEEE 26th …, 2019 - ieeexplore.ieee.org

The recent advent of the NVLink interconnect and Peripheral Component Interconnect
express (PCIe) switch has resulted in the creation of extremely dense Graphics Processing …

被引用次数：9 相关文章所有 2 个版本

[PDF] nsf.gov

Network assisted non-contiguous transfers for GPU-aware MPI libraries

KK Suresh, KS Khorassani, CC Chen… - … IEEE Symposium on …, 2022 - ieeexplore.ieee.org

The importance of GPUs in accelerating HPC applications is evident by the fact that a large
number of super-computing clusters are GPU-enabled. Many of these HPC applications use …

被引用次数：5 相关文章所有 9 个版本

高级搜索

QQ 群