Parity models: erasure-coded resilience for prediction serving systems

Z Ye, W Gao, Q Hu, P Sun, X Wang, Y Luo… - ACM Computing …, 2024 - dl.acm.org

Deep learning (DL) has demonstrated its remarkable success in a wide variety of fields. The
development of a DL model is a time-consuming and resource-intensive procedure. Hence …

被引用次数：9 相关文章所有 4 个版本

[PDF] arxiv.org

Pytorch distributed: Experiences on accelerating data parallel training

S Li, Y Zhao, R Varma, O Salpekar, P Noordhuis… - arXiv preprint arXiv …, 2020 - arxiv.org

This paper presents the design, implementation, and evaluation of the PyTorch distributed
data parallel module. PyTorch is a widely-adopted scientific computing package used in …

被引用次数：571 相关文章所有 11 个版本

[PDF] arxiv.org

Deep learning workload scheduling in gpu datacenters: Taxonomy, challenges and vision

W Gao, Q Hu, Z Ye, P Sun, X Wang, Y Luo… - arXiv preprint arXiv …, 2022 - arxiv.org

Deep learning (DL) shows its prosperity in a wide variety of fields. The development of a DL
model is a time-consuming and resource-intensive procedure. Hence, dedicated GPU …

被引用次数：29 相关文章所有 3 个版本

[PDF] acm.org

Spotserve: Serving generative large language models on preemptible instances

X Miao, C Shi, J Duan, X Xi, D Lin, B Cui… - Proceedings of the 29th …, 2024 - dl.acm.org

The high computational and memory requirements of generative large language models
(LLMs) make it challenging to serve them cheaply. This paper aims to reduce the monetary …

被引用次数：28 相关文章所有 4 个版本

[PDF] usenix.org

Transparent {GPU} sharing in container clouds for deep learning workloads

B Wu, Z Zhang, Z Bai, X Liu, X Jin - 20th USENIX Symposium on …, 2023 - usenix.org

Containers are widely used for resource management in datacenters. A common practice to
support deep learning (DL) training in container clouds is to statically bind GPUs to …

被引用次数：27 相关文章所有 5 个版本

[PDF] usenix.org

{GL-Cache}: Group-level learning for efficient and high-performance caching

J Yang, Z Mao, Y Yue, KV Rashmi - 21st USENIX Conference on File …, 2023 - usenix.org

Web applications rely heavily on software caches to achieve low-latency, high-throughput
services. To adapt to changing workloads, three types of learned caches (learned evictions) …

被引用次数：21 相关文章所有 10 个版本

[PDF] usenix.org

Following the data, not the function: Rethinking function orchestration in serverless computing

M Yu, T Cao, W Wang, R Chen - 20th USENIX Symposium on …, 2023 - usenix.org

Serverless applications are typically composed of function workflows in which multiple short-
lived functions are triggered to exchange data in response to events or state changes …

被引用次数：40 相关文章所有 12 个版本

[PDF] arxiv.org

Adaptive verifiable coded computing: Towards fast, secure and private distributed machine learning

T Tang, RE Ali, H Hashemi, T Gangwani… - 2022 IEEE …, 2022 - ieeexplore.ieee.org

Stragglers, Byzantine workers, and data privacy are the main bottlenecks in distributed cloud
computing. Some prior works proposed coded computing strategies to jointly address all …

被引用次数：27 相关文章所有 5 个版本

[PDF] arxiv.org

Optimizing prediction serving on low-latency serverless dataflow

V Sreekanti, H Subbaraj, C Wu, JE Gonzalez… - arXiv preprint arXiv …, 2020 - arxiv.org

Prediction serving systems are designed to provide large volumes of low-latency inferences
machine learning models. These systems mix data processing and computationally …

被引用次数：26 相关文章所有 2 个版本

Compression-informed coded computing

M Rudow, N Charalambides, AO Hero… - … on Information Theory …, 2023 - ieeexplore.ieee.org

Large-scale computations are ubiquitous and demand exorbitant resources, with matrix
multiplication being a prominent example. Multiplying high-dimensional matrices is …

被引用次数：6 相关文章

高级搜索

QQ 群