ConnectX-2 InfiniBand management queues: First investigation of the new support for network...

R Parizotto, BL Coelho, DC Nunes, I Haque… - ACM Computing …, 2023 - dl.acm.org

The demand for machine learning (ML) has increased significantly in recent decades,
enabling several applications, such as speech recognition, computer vision, and …

被引用次数：19 相关文章所有 3 个版本

[PDF] nvidia.com

Scalable hierarchical aggregation protocol (SHArP): A hardware architecture for efficient data reduction

RL Graham, D Bureddy, P Lui… - … in HPC (COMHPC), 2016 - ieeexplore.ieee.org

Increased system size and a greater reliance on utilizing system parallelism to achieve
computational needs, requires innovative system architectures to meet the simulation …

被引用次数：167 相关文章所有 6 个版本

[PDF] ict.ac.cn

High performance interconnect network for Tianhe system

XK Liao, ZB Pang, KF Wang, YT Lu, M Xie, J Xia… - Journal of Computer …, 2015 - Springer

In this paper, we present the Tianhe-2 interconnect network and message passing services.
We describe the architecture of the router and network interface chips, and highlight a set of …

被引用次数：90 相关文章所有 8 个版本

[PDF] springer.com

Scalable Hierarchical Aggregation and Reduction Protocol (SHARP)^TM Streaming-Aggregation Hardware Design and Evaluation

RL Graham, L Levi, D Burredy, G Bloch… - … Conference, ISC High …, 2020 - Springer

This paper describes the new hardware-based streaming-aggregation capability added to
Mellanox's Scalable Hierarchical Aggregation and Reduction Protocol in its HDR InfiniBand …

被引用次数：44 相关文章所有 5 个版本

[PDF] ethz.ch

A RISC-V in-network accelerator for flexible high-performance low-power packet processing

S Di Girolamo, A Kurth, A Calotoiu… - 2021 ACM/IEEE 48th …, 2021 - ieeexplore.ieee.org

The capacity of offloading data and control tasks to the network is becoming increasingly
important, especially if we consider the faster growth of network speed when compared to …

被引用次数：33 相关文章所有 21 个版本

[PDF] susu.ru

Energy, memory, and runtime tradeoffs for implementing collective communication operations

T Hoefler, D Moor - Supercomputing frontiers and innovations, 2014 - superfri.susu.ru

Collective operations are among the most important communication operations in shared-
and distributed-memory parallel applications. In this paper, we analyze the tradeoffs …

被引用次数：56 相关文章所有 28 个版本

[PDF] researchgate.net

High-performance and scalable non-blocking all-to-all with collective offload on InfiniBand clusters: a study with parallel 3D FFT

K Kandalla, H Subramoni, K Tomko… - … Science-Research and …, 2011 - Springer

Three-dimensional FFT is an important component of many scientific computing applications
ranging from fluid dynamics, to astrophysics and molecular dynamics. P3DFFT is a widely …

被引用次数：77 相关文章所有 7 个版本

[PDF] researchgate.net

The TH Express high performance interconnect networks

Z Pang, M Xie, J Zhang, Y Zheng, G Wang… - Frontiers of Computer …, 2014 - Springer

Interconnection network plays an important role in scalable high performance computer
(HPC) systems. The TH Express-2 interconnect has been used in MilkyWay-2 system to …

被引用次数：59 相关文章所有 7 个版本

Cheetah: A framework for scalable hierarchical collective operations

R Graham, MG Venkata, J Ladd… - 2011 11th IEEE/ACM …, 2011 - ieeexplore.ieee.org

Collective communication operations, used by many scientific applications, tend to limit
overall parallel application performance and scalability. Computer systems are becoming …

被引用次数：57 相关文章所有 5 个版本

[PDF] researchgate.net

Overlapping computation and communication: Barrier algorithms and ConnectX-2 CORE-Direct capabilities

RL Graham, S Poole, P Shamis, G Bloch… - … on Parallel & …, 2010 - ieeexplore.ieee.org

This paper explores the computation and communication overlap capabilities enabled by
the new CORE-Direct hardware capabilities introduced in the InfiniBand Network Interface …

被引用次数：61 相关文章所有 8 个版本

高级搜索

QQ 群