Characterization of large language model development in the datacenter

Q Hu, Z Ye, Z Wang, G Wang, M Zhang… - … USENIX Symposium on …, 2024 - usenix.org
Large Language Models (LLMs) have presented impressive performance across several
transformative tasks. However, it is non-trivial to efficiently utilize large-scale cluster …

sPIN: High-performance streaming Processing in the Network

T Hoefler, S Di Girolamo, K Taranov, RE Grant… - Proceedings of the …, 2017 - dl.acm.org
Optimizing communication performance is imperative for large-scale computing because
communication overheads limit the strong scalability of parallel applications. Today's …

Eflops: Algorithm and system co-design for a high performance distributed training platform

J Dong, Z Cao, T Zhang, J Ye, S Wang… - … Symposium on High …, 2020 - ieeexplore.ieee.org
Deep neural networks (DNNs) have gained tremendous attractions as compelling solutions
for applications such as image classification, object detection, speech recognition, and so …

Invisible probe: Timing attacks with PCIe congestion side-channel

M Tan, J Wan, Z Zhou, Z Li - 2021 IEEE Symposium on Security …, 2021 - ieeexplore.ieee.org
PCIe (Peripheral Component Interconnect express) protocol is the de facto protocol to
bridge CPU and peripheral devices like GPU, NIC, and SSD drive. There is an increasing …

A design flow for scheduling spiking deep convolutional neural networks on heterogeneous neuromorphic system-on-chip

A Das - ACM Transactions on Embedded Computing Systems, 2023 - dl.acm.org
Neuromorphic systems-on-chip (NSoCs) integrate CPU cores and neuromorphic hardware
accelerators on the same chip. These platforms can execute spiking deep convolutional …

Hostping: Diagnosing intra-host network bottlenecks in {RDMA} servers

K Liu, Z Jiang, J Zhang, H Wei, X Zhong, L Tan… - … USENIX Symposium on …, 2023 - usenix.org
Intra-host networking was considered robust in the RDMA (Remote Direct Memory Access)
network and received little attention. However, as the RNIC (RDMA NIC) line rate increases …

Network-accelerated non-contiguous memory transfers

S Di Girolamo, K Taranov, A Kurth… - Proceedings of the …, 2019 - dl.acm.org
Applications often communicate data that is non-contiguous in the send-or the receive-
buffer, eg, when exchanging a column of a matrix stored in row-major order. While non …

Cloud FPGA cartography using PCIe contention

S Tian, I Giechaskiel, W Xiong… - 2021 IEEE 29th Annual …, 2021 - ieeexplore.ieee.org
Public cloud infrastructures allow for easy, on-demand access to FPGA resources. However,
the low-level, direct access to the FPGA hardware exposes the infrastructure providers to …

A profile-based ai-assisted dynamic scheduling approach for heterogeneous architectures

T Geng, M Amaris, S Zuckerman, A Goldman… - International Journal of …, 2022 - Springer
While heterogeneous architectures are increasing popular with High Performance
Computing systems, their effectiveness depends on how efficient the scheduler is at …

Maximizing I/O bandwidth for reverse time migration on heterogeneous large-scale systems

T Alturkestani, H Ltaief, D Keyes - … , Warsaw, Poland, August 24–28, 2020 …, 2020 - Springer
Abstract Reverse Time Migration (RTM) is an important scientific application for oil and gas
exploration. The 3D RTM simulation generates terabytes of intermediate data that does not …