Achieving microsecond-scale tail latency efficiently with approximate optimal scheduling

R Iyer, M Unal, M Kogias, G Candea - Proceedings of the 29th …, 2023 - dl.acm.org
Datacenter applications expect microsecond-scale service times and tightly bound tail
latency, with future workloads expected to be even more demanding. To address this …

{RingLeader}: efficiently Offloading {Intra-Server} Orchestration to {NICs}

J Lin, A Cardoza, T Khan, Y Ro, BE Stephens… - … USENIX Symposium on …, 2023 - usenix.org
Careful orchestration of requests at a datacenter server is crucial to meet tight tail latency
requirements and ensure high throughput and optimal CPU utilization. Orchestration is multi …

Harvesting Memory-bound {CPU} Stall Cycles in Software with {MSH}

Z Luo, S Son, S Ratnasamy, S Shenker - 18th USENIX Symposium on …, 2024 - usenix.org
Memory-bound stalls account for a significant portion of CPU cycles in datacenter
workloads, which makes harvesting them to execute other useful work highly valuable …

Automatic parallelism management

S Westrick, M Fluet, M Rainey, UA Acar - Proceedings of the ACM on …, 2024 - dl.acm.org
On any modern computer architecture today, parallelism comes with a modest cost, born
from the creation and management of threads or tasks. Today, programmers battle this cost …

Efficient Microsecond-scale Blind Scheduling with Tiny Quanta

Z Luo, S Son, D Bali, E Amaro, A Ousterhout… - Proceedings of the 29th …, 2024 - dl.acm.org
A longstanding performance challenge in datacenter-based applications is how to efficiently
handle incoming client requests that spawn many very short (μs scale) jobs that must be …

Latency Interfaces for Systems Code

RR Iyer - 2023 - infoscience.epfl.ch
This thesis demonstrates that it is feasible for systems code to expose a latency interface that
describes its latency and related side effects for all inputs, just like the code's semantic …

[PDF][PDF] A fully-functional Cache Control Coprocessor for Enzian

M Hässig - 2024 - research-collection.ethz.ch
The Enzian research computer offers a unique platform for testing novel cache coherence
protocols in hardware with its 48-core ThunderX CPU and FPGA connected over a coherent …