A survey on in-network computing: Programmable data plane and technology specific applications

S Kianpisheh, T Taleb - IEEE Communications Surveys & …, 2022 - ieeexplore.ieee.org
In comparison with cloud computing, edge computing offers processing at locations closer to
end devices and reduces the user experienced latency. The new recent paradigm of in …

In-network machine learning using programmable network devices: A survey

C Zheng, X Hong, D Ding, S Vargaftik… - … Surveys & Tutorials, 2023 - ieeexplore.ieee.org
Machine learning is widely used to solve networking challenges, ranging from traffic
classification and anomaly detection to network configuration. However, machine learning …

A unified architecture for accelerating distributed {DNN} training in heterogeneous {GPU/CPU} clusters

Y Jiang, Y Zhu, C Lan, B Yi, Y Cui, C Guo - 14th USENIX Symposium on …, 2020 - usenix.org
Data center clusters that run DNN training jobs are inherently heterogeneous. They have
GPUs and CPUs for computation and network bandwidth for distributed training. However …

Scaling distributed machine learning with {In-Network} aggregation

A Sapio, M Canini, CY Ho, J Nelson, P Kalnis… - … USENIX Symposium on …, 2021 - usenix.org
Training machine learning models in parallel is an increasingly important workload. We
accelerate distributed parallel training by designing a communication primitive that uses a …

An exhaustive survey on p4 programmable data plane switches: Taxonomy, applications, challenges, and future trends

EF Kfoury, J Crichigno, E Bou-Harb - IEEE access, 2021 - ieeexplore.ieee.org
Traditionally, the data plane has been designed with fixed functions to forward packets using
a small set of protocols. This closed-design paradigm has limited the capability of the …

Offloading distributed applications onto smartnics using ipipe

M Liu, T Cui, H Schuh, A Krishnamurthy… - Proceedings of the …, 2019 - dl.acm.org
Emerging Multicore SoC SmartNICs, enclosing rich computing resources (eg, a multicore
processor, onboard DRAM, accelerators, programmable DMA engines), hold the potential to …

{ATP}: In-network aggregation for multi-tenant learning

CL Lao, Y Le, K Mahajan, Y Chen, W Wu… - … USENIX Symposium on …, 2021 - usenix.org
Distributed deep neural network training (DT) systems are widely deployed in clusters where
the network is shared across multiple tenants, ie, multiple DT jobs. Each DT job computes …

Do switches dream of machine learning? toward in-network classification

Z Xiong, N Zilberman - Proceedings of the 18th ACM workshop on hot …, 2019 - dl.acm.org
Machine learning is currently driving a technological and societal revolution. While
programmable switches have been proven to be useful for in-network computing, machine …

Mind: In-network memory management for disaggregated data centers

S Lee, Y Yu, Y Tang, A Khandelwal, L Zhong… - Proceedings of the …, 2021 - dl.acm.org
Memory disaggregation promises transparent elasticity, high resource utilization and
hardware heterogeneity in data centers by physically separating memory and compute into …

The programmable data plane: Abstractions, architectures, algorithms, and applications

O Michel, R Bifulco, G Retvari, S Schmid - ACM Computing Surveys …, 2021 - dl.acm.org
Programmable data plane technologies enable the systematic reconfiguration of the low-
level processing steps applied to network packets and are key drivers toward realizing the …