RE Grant, MGF Dosanjh, MJ Levenhagen… - … Conference, ISC High …, 2019 - Springer
The MPI multithreading model has been historically difficult to optimize; the interface that it provides for threads was designed as a process-level interface. This model has led to …
The gap is widening between the processor clock speed of end-system architectures and network throughput capabilities. It is now physically possible to provide single-flow …
Techniques are disclosed to throttle bandwidth imbalanced data transfers. In some examples, an example computer-implemented method may include splitting a payload of a …
Applications often communicate data that is non-contiguous in the send-or the receive- buffer, eg, when exchanging a column of a matrix stored in row-major order. While non …
Current proposals for in-network data processing operate on data as it streams through a network switch or endpoint. Since compute resources must be available when data arrives …
In the Fully Sharded Data Parallel (FSDP) training pipeline, collective operations can be interleaved to maximize the communication/computation overlap. In this scenario …
S Di Girolamo, P Jolivet… - 2015 IEEE 23rd …, 2015 - ieeexplore.ieee.org
Network interface cards are one of the key components to achieve efficient parallel performance. In the past, they have gained new functionalities such as lossless …
G Sabin, M Rashti - 2015 National Aerospace and Electronics …, 2015 - ieeexplore.ieee.org
The SmartNIC is a User-Programmable 10GE NIC designed around industry standards to meet the demands of high performance networking in HPC and datacenter communities …
As network speeds increase, the overhead of processing incoming messages is becoming onerous enough that many manufacturers now provide network interface cards (NICs) with …