The programmable data plane: Abstractions, architectures, algorithms, and applications

O Michel, R Bifulco, G Retvari, S Schmid - ACM Computing Surveys …, 2021 - dl.acm.org
Programmable data plane technologies enable the systematic reconfiguration of the low-
level processing steps applied to network packets and are key drivers toward realizing the …

Caladan: Mitigating interference at microsecond timescales

J Fried, Z Ruan, A Ousterhout, A Belay - 14th USENIX Symposium on …, 2020 - usenix.org
The conventional wisdom is that CPU resources such as cores, caches, and memory
bandwidth must be partitioned to achieve performance isolation between tasks. Both the …

Smartnic performance isolation with fairnic: Programmable networking for the cloud

S Grant, A Yelam, M Bland, AC Snoeren - Proceedings of the Annual …, 2020 - dl.acm.org
Multiple vendors have recently released SmartNICs that provide both special-purpose
accelerators and programmable processing cores that allow increasingly sophisticated …

Nuberu: Reliable RAN virtualization in shared platforms

G Garcia-Aviles, A Garcia-Saavedra… - Proceedings of the 27th …, 2021 - dl.acm.org
RAN virtualization will become a key technology for the last mile of next-generation mobile
networks driven by initiatives such as the O-RAN alliance. However, due to the computing …

Annulus: A dual congestion control loop for datacenter and wan traffic aggregates

A Saeed, V Gupta, P Goyal, M Sharif, R Pan… - Proceedings of the …, 2020 - dl.acm.org
Cloud services are deployed in datacenters connected though high-bandwidth Wide Area
Networks (WANs). We find that WAN traffic negatively impacts the performance of datacenter …

Understanding {RDMA} microarchitecture resources for performance isolation

X Kong, J Chen, W Bai, Y Xu, M Elhaddad… - … USENIX Symposium on …, 2023 - usenix.org
Recent years have witnessed the wide adoption of RDMA in the cloud to accelerate first-
party workloads and achieve cost savings by freeing up CPU cycles. Now cloud providers …

In-network aggregation with transport transparency for distributed training

S Liu, Q Wang, J Zhang, W Wu, Q Lin, Y Liu… - Proceedings of the 28th …, 2023 - dl.acm.org
Recent In-Network Aggregation (INA) solutions offload the all-reduce operation onto network
switches to accelerate and scale distributed training (DT). On end hosts, these solutions …

Automatically Reasoning About How Systems Code Uses the {CPU} Cache

R Iyer, K Argyraki, G Candea - 18th USENIX Symposium on Operating …, 2024 - usenix.org
We present a technique, called CFAR, that developers can use to reason precisely about
how their code, as well as third-party code, uses the CPU cache. Given a piece of systems …

vSoC: Efficient Virtual System-on-Chip on Heterogeneous Hardware

J Qiu, Z Zhou, Y Li, Z Li, F Qian, H Lin, D Gao… - Proceedings of the …, 2024 - dl.acm.org
Emerging mobile apps such as UHD video and AR/VR access diverse high-throughput
hardware devices, eg, video codecs, cameras, and image processors. However, today's …

Twenty years after: Hierarchical {Core-Stateless} fair queueing

Z Yu, J Wu, V Braverman, I Stoica, X Jin - 18th USENIX Symposium on …, 2021 - usenix.org
Core-Stateless Fair Queueing (CSFQ) is a scalable algorithm proposed more than two
decades ago to achieve fair queueing without keeping per-flow state in the network …