Data center clusters that run DNN training jobs are inherently heterogeneous. They have GPUs and CPUs for computation and network bandwidth for distributed training. However …
It is commonly believed that datacenter networking software must sacrifice generality to attain high performance. The popularity of specialized distributed systems designed …
Datacenter systems and I/O devices now run at single-digit microsecond latencies, requiring ns-scale operating systems. Traditional kernel-based operating systems impose an …
Emerging hardware like persistent memory (PM) and high-speed NICs are promising to build efficient key-value stores. However, we observe that the small-sized access pattern in …
SY Tsai, Y Shan, Y Zhang - 2020 USENIX Annual Technical Conference …, 2020 - usenix.org
Many datacenters and clouds manage storage systems separately from computing services for better manageability and resource utilization. These existing disaggregated storage …
M Marty, M de Kruijf, J Adriaens, C Alfeld… - Proceedings of the 27th …, 2019 - dl.acm.org
This paper presents our design and experience with a microkernel-inspired approach to host networking called Snap. Snap is a userspace networking system that supports Google's …
Far memory systems allow an application to transparently access local memory as well as memory belonging to remote machines. Fault tolerance is a critical property of any practical …
B Zhu, Y Chen, Q Wang, Y Lu, J Shu - ACM Transactions on Storage …, 2021 - dl.acm.org
Non-volatile memory and remote direct memory access (RDMA) provide extremely high performance in storage and network hardware. However, existing distributed file systems …