A survey of large-scale deep learning serving system optimization: Challenges and opportunities

Z Ye, W Gao, Q Hu, P Sun, X Wang, Y Luo… - ACM Computing …, 2024 - dl.acm.org

Deep learning (DL) has demonstrated its remarkable success in a wide variety of fields. The
development of a DL model is a time-consuming and resource-intensive procedure. Hence …

被引用次数：4 相关文章所有 4 个版本

[PDF] arxiv.org

Deep learning workload scheduling in gpu datacenters: Taxonomy, challenges and vision

W Gao, Q Hu, Z Ye, P Sun, X Wang, Y Luo… - arXiv preprint arXiv …, 2022 - arxiv.org

Deep learning (DL) shows its prosperity in a wide variety of fields. The development of a DL
model is a time-consuming and resource-intensive procedure. Hence, dedicated GPU …

被引用次数：24 相关文章所有 3 个版本

[PDF] usenix.org

Beware of Fragmentation: Scheduling {GPU-Sharing} Workloads with Fragmentation Gradient Descent

Q Weng, L Yang, Y Yu, W Wang, X Tang… - 2023 USENIX Annual …, 2023 - usenix.org

Large tech companies are piling up a massive number of GPUs in their server fleets to run
diverse machine learning (ML) workloads. However, these expensive devices often suffer …

被引用次数：16 相关文章所有 10 个版本

[PDF] acm.org

Dycl: Dynamic neural network compilation via program rewriting and graph optimization

S Chen, S Wei, C Liu, W Yang - Proceedings of the 32nd ACM SIGSOFT …, 2023 - dl.acm.org

The deep learning (DL) compiler serves as a vital infrastructure component to enable the
deployment of deep neural networks on diverse hardware platforms such as mobile devices …

被引用次数：3 相关文章所有 5 个版本

[PDF] mdpi.com

A deep learning model of spatial distance and named entity recognition (SD-NER) for flood mark text classification

R Szczepanek - Water, 2023 - mdpi.com

Information on historical flood levels can be communicated verbally, in documents, or in the
form of flood marks. The latter are the most useful from the point of view of public awareness …

被引用次数：5 相关文章所有 5 个版本

[PDF] mdpi.com

HetSev: Exploiting Heterogeneity-Aware Autoscaling and Resource-Efficient Scheduling for Cost-Effective Machine-Learning Model Serving

H Mo, L Zhu, L Shi, S Tan, S Wang - Electronics, 2023 - mdpi.com

To accelerate the inference of machine-learning (ML) model serving, clusters of machines
require the use of expensive hardware accelerators (eg, GPUs) to reduce execution time …

被引用次数：3 相关文章所有 3 个版本

[PDF] ieee.org

Achieving Peak Performance for Large Language Models: A Systematic Review

ZRK Rostam, S Szénási, G Kertész - IEEE Access, 2024 - ieeexplore.ieee.org

In recent years, large language models (LLMs) have achieved remarkable success in
natural language processing (NLP). LLMs require an extreme amount of parameters to …

[PDF] mdpi.com

高级搜索

QQ 群