: Efficient Resource Disaggregation for Deep Learning Workloads

X Jin, Z Bai, Z Zhang, Y Zhu, Y Zhong… - IEEE/ACM Transactions …, 2024 - ieeexplore.ieee.org
Deep learning (DL) systems suffer from low resource utilization due to 1) monolithic server
model that tightly couples compute and memory; and 2) limited sharing between different …

Exploring the Asynchrony of Slow Memory Filesystem with EasyIO

B Zhu, Y Chen, J Shu - … of the Nineteenth European Conference on …, 2024 - dl.acm.org
We introduce EasyIO, a new approach to explore asynchronous I/O on filesystems designed
for (disaggregated) nonvolatile memories to improve CPU efficiency. EasyIO offloads …

Mitigating Write Disturbance in Non-Volatile Memory via Coupling Machine Learning with Out-of-Place Updates

R Wu, Z Shen, Z Yang, J Shu - 2024 IEEE International …, 2024 - ieeexplore.ieee.org
Non-volatile memory (NVM) opens up new opportunities to resolve scaling restrictions of
main memory, yet it is still hindered by the write disturbance (WD) problem. The WD problem …

Object-oriented Unified Encrypted Memory Management for Heterogeneous Memory Architectures

M Sha, Y Cai, S Wang, LTX Phan, F Li… - Proceedings of the ACM …, 2024 - dl.acm.org
In contemporary database applications, the demand for memory resources is intensively
high. To enhance adaptability to varying resource needs and improve cost efficiency, the …

DistR: Language-Guided Distributed Shared Memory with Fine Granularity, Full Transparency, and Ultra Efficiency

H Ma, Y Qiao, S Liu, S Yu, Y Ni, Q Lu, J Wu… - arXiv preprint arXiv …, 2024 - arxiv.org
Despite being a powerful concept, distributed shared memory (DSM) has not been made
practical due to the extensive synchronization needed between servers to implement …

Trimma: Trimming Metadata Storage and Latency for Hybrid Memory Systems

Y Li, B Tian, M Gao - arXiv preprint arXiv:2402.16343, 2024 - arxiv.org
Hybrid main memory systems combine both performance and capacity advantages from
heterogeneous memory technologies. With larger capacities, higher associativities, and finer …

A Fault‐tolerant model for tuple space coordination in distributed environments

M Kirti, AK Maurya, RS Yadav - Concurrency and Computation …, 2024 - Wiley Online Library
In distributed systems, tuple space is one of the coordination models that significantly
maximizes system performance against failure due to its space and time decoupling …

DDC: A Vision for a Disaggregated Datacenter

M Ewais, P Chow - arXiv preprint arXiv:2402.12742, 2024 - arxiv.org
Datacenters of today have maintained the same architecture for decades using the server as
the primary building block. However, this traditional approach suffers from under-utilization …

Providing scalable single‐operating‐system NUMA abstraction of physically discrete resources

BS An, MH Cha, SM Lee, WH Yang, HY Kim - ETRI Journal, 2024 - Wiley Online Library
With an explosive increase of data produced annually, researchers have been attempting to
develop solutions for systems that can effectively handle large amounts of data. Single …

Semantics-Guided Systems Foundations for Disaggregated Datacenters

H Ma - 2024 - escholarship.org
Resource disaggregation has emerged as a promising solution to enhance both resource
utilization and management efficiency in datacenters. Existing disaggregation solutions …