A survey on in-network computing: Programmable data plane and technology specific applications

S Kianpisheh, T Taleb - IEEE Communications Surveys & …, 2022 - ieeexplore.ieee.org
In comparison with cloud computing, edge computing offers processing at locations closer to
end devices and reduces the user experienced latency. The new recent paradigm of in …

The programmable data plane: Abstractions, architectures, algorithms, and applications

O Michel, R Bifulco, G Retvari, S Schmid - ACM Computing Surveys …, 2021 - dl.acm.org
Programmable data plane technologies enable the systematic reconfiguration of the low-
level processing steps applied to network packets and are key drivers toward realizing the …

Pytorch fsdp: experiences on scaling fully sharded data parallel

Y Zhao, A Gu, R Varma, L Luo, CC Huang, M Xu… - arXiv preprint arXiv …, 2023 - arxiv.org
It is widely acknowledged that large models have the potential to deliver superior
performance across a broad range of domains. Despite the remarkable progress made in …

Scaling distributed machine learning with {In-Network} aggregation

A Sapio, M Canini, CY Ho, J Nelson, P Kalnis… - … USENIX Symposium on …, 2021 - usenix.org
Training machine learning models in parallel is an increasingly important workload. We
accelerate distributed parallel training by designing a communication primitive that uses a …

Netcache: Balancing key-value stores with fast in-network caching

X Jin, X Li, H Zhang, R Soulé, J Lee, N Foster… - Proceedings of the 26th …, 2017 - dl.acm.org
We present NetCache, a new key-value store architecture that leverages the power and
flexibility of new-generation programmable switches to handle queries on hot items and …

An exhaustive survey on p4 programmable data plane switches: Taxonomy, applications, challenges, and future trends

EF Kfoury, J Crichigno, E Bou-Harb - IEEE access, 2021 - ieeexplore.ieee.org
Traditionally, the data plane has been designed with fixed functions to forward packets using
a small set of protocols. This closed-design paradigm has limited the capability of the …

Offloading distributed applications onto smartnics using ipipe

M Liu, T Cui, H Schuh, A Krishnamurthy… - Proceedings of the …, 2019 - dl.acm.org
Emerging Multicore SoC SmartNICs, enclosing rich computing resources (eg, a multicore
processor, onboard DRAM, accelerators, programmable DMA engines), hold the potential to …

[PDF][PDF] Poseidon: Mitigating volumetric ddos attacks with programmable switches

M Zhang, G Li, S Wang, C Liu, A Chen, H Hu… - the 27th Network and …, 2020 - par.nsf.gov
Distributed Denial-of-Service (DDoS) attacks have become a critical threat to the Internet.
Due to the increasing number of vulnerable Internet of Things (IoT) devices, attackers can …

{ATP}: In-network aggregation for multi-tenant learning

CL Lao, Y Le, K Mahajan, Y Chen, W Wu… - … USENIX Symposium on …, 2021 - usenix.org
Distributed deep neural network training (DT) systems are widely deployed in clusters where
the network is shared across multiple tenants, ie, multiple DT jobs. Each DT job computes …

{NetChain}:{Scale-Free}{Sub-RTT} coordination

X Jin, X Li, H Zhang, N Foster, J Lee, R Soulé… - … USENIX Symposium on …, 2018 - usenix.org
Coordination services are a fundamental building block of modern cloud systems, providing
critical functionalities like configuration management and distributed locking. The major …