A survey of large-scale deep learning serving system optimization: Challenges and opportunities

F Yu, D Wang, L Shangguan, M Zhang, X Tang… - arXiv preprint arXiv …, 2021 - arxiv.org
Deep Learning (DL) models have achieved superior performance in many application
domains, including vision, language, medical, commercial ads, entertainment, etc. With the …

MTM: Rethinking Memory Profiling and Migration for Multi-Tiered Large Memory

J Ren, D Xu, J Ryu, K Shin, D Kim, D Li - Proceedings of the Nineteenth …, 2024 - dl.acm.org
Multi-terabyte large memory systems are often characterized by more than two memory tiers
with different latency and bandwidth. Multi-tiered large memory systems call for rethinking of …

CoNST: Code Generator for Sparse Tensor Networks

S Raje, Y Xu, A Rountev, E Valeev… - ACM Transactions on …, 2024 - dl.acm.org
Sparse tensor networks represent contractions over multiple sparse tensors. Tensor
contractions are higher-order analogs of matrix multiplication. Tensor networks arise …

Optimizing large-scale plasma simulations on persistent memory-based heterogeneous memory with effective data placement across memory hierarchy

J Ren, J Luo, I Peng, K Wu, D Li - Proceedings of the ACM International …, 2021 - dl.acm.org
Particle simulations of plasma are important for understanding plasma dynamics in space
weather and fusion devices. However, production simulations that use billions and even …

SparseLNR: accelerating sparse tensor computations using loop nest restructuring

A Dias, K Sundararajah, C Saumya… - Proceedings of the 36th …, 2022 - dl.acm.org
Sparse tensor algebra computations have become important in many real-world
applications like machine learning, scientific simulations, and data mining. Hence …

Merchandiser: Data placement on heterogeneous memory for task-parallel hpc applications with load-balance awareness

Z Xie, J Liu, J Li, D Li - Proceedings of the 28th ACM SIGPLAN Annual …, 2023 - dl.acm.org
The emergence of heterogeneous memory (HM) provides a cost-effective and high-
performance solution to memory-consuming HPC applications. Deciding the placement of …

{FlexMem}: Adaptive Page Profiling and Migration for Tiered Memory

D Xu, J Ryu, K Shin, P Su, D Li - 2024 USENIX Annual Technical …, 2024 - usenix.org
Tiered memory, combining multiple memory components with different performance and
capacity, provides a cost-effective solution to increase memory capacity and improve …

MemHC: an optimized GPU memory management framework for accelerating many-body correlation

Q Wang, Z Peng, B Ren, J Chen… - ACM Transactions on …, 2022 - dl.acm.org
The many-body correlation function is a fundamental computation kernel in modern physics
computing applications, eg, Hadron Contractions in Lattice quantum chromodynamics …

Single-node partitioned-memory for huge graph analytics: cost and performance trade-offs

S Ghosh, NR Tallent, M Minutoli… - Proceedings of the …, 2021 - dl.acm.org
Because of cost, non-volatile memory NVDIMMs such as Intel Optane are attractive in single-
node big-memory systems. We evaluate performance and cost trade-offs when using …

Efficient Utilization of Multi-Threading Parallelism on Heterogeneous Systems for Sparse Tensor Contraction

G Xiao, C Yin, Y Chen, M Duan… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
Many fields of scientific simulation, such as chemistry and condensed matter physics, are
increasingly eschewing dense tensor contraction in favor of sparse tensor contraction. In this …