[PDF][PDF] Efficient Tensor Offloading for Large Deep-Learning Model Training based on Compute Express Link

D Xu, Y Feng, K Shin, D Kim, H Jeon, D Li - pasalabs.org
The deep learning models (DL) are becoming bigger, easily beyond the memory capacity of
a single accelerator. The recent progress in large DL training utilizes CPU memory as an …

[PDF][PDF] DmRPC: Disaggregated Memory-aware Datacenter RPC for Data-intensive Applications

J Zhang, X Chen, Y Zhang, Z Wang - wangzeke.github.io
Modern datacenter applications are increasingly being built using a microservices
architecture. These microservices communicate with each other using datacenter RPCs …