Deep learning workload scheduling in gpu datacenters: A survey

Z Ye, W Gao, Q Hu, P Sun, X Wang, Y Luo… - ACM Computing …, 2024 - dl.acm.org
Deep learning (DL) has demonstrated its remarkable success in a wide variety of fields. The
development of a DL model is a time-consuming and resource-intensive procedure. Hence …

Deep learning workload scheduling in gpu datacenters: Taxonomy, challenges and vision

W Gao, Q Hu, Z Ye, P Sun, X Wang, Y Luo… - arXiv preprint arXiv …, 2022 - arxiv.org
Deep learning (DL) shows its prosperity in a wide variety of fields. The development of a DL
model is a time-consuming and resource-intensive procedure. Hence, dedicated GPU …

Beware of Fragmentation: Scheduling {GPU-Sharing} Workloads with Fragmentation Gradient Descent

Q Weng, L Yang, Y Yu, W Wang, X Tang… - 2023 USENIX Annual …, 2023 - usenix.org
Large tech companies are piling up a massive number of GPUs in their server fleets to run
diverse machine learning (ML) workloads. However, these expensive devices often suffer …

Dycl: Dynamic neural network compilation via program rewriting and graph optimization

S Chen, S Wei, C Liu, W Yang - Proceedings of the 32nd ACM SIGSOFT …, 2023 - dl.acm.org
The deep learning (DL) compiler serves as a vital infrastructure component to enable the
deployment of deep neural networks on diverse hardware platforms such as mobile devices …

A deep learning model of spatial distance and named entity recognition (SD-NER) for flood mark text classification

R Szczepanek - Water, 2023 - mdpi.com
Information on historical flood levels can be communicated verbally, in documents, or in the
form of flood marks. The latter are the most useful from the point of view of public awareness …

HetSev: Exploiting Heterogeneity-Aware Autoscaling and Resource-Efficient Scheduling for Cost-Effective Machine-Learning Model Serving

H Mo, L Zhu, L Shi, S Tan, S Wang - Electronics, 2023 - mdpi.com
To accelerate the inference of machine-learning (ML) model serving, clusters of machines
require the use of expensive hardware accelerators (eg, GPUs) to reduce execution time …

Achieving Peak Performance for Large Language Models: A Systematic Review

ZRK Rostam, S Szénási, G Kertész - IEEE Access, 2024 - ieeexplore.ieee.org
In recent years, large language models (LLMs) have achieved remarkable success in
natural language processing (NLP). LLMs require an extreme amount of parameters to …

Enhanced Scheduling of AI Applications in Multi-Tenant Cloud Using Genetic Optimizations

S Kwon, H Bahn - Applied Sciences, 2024 - mdpi.com
The artificial intelligence (AI) industry is increasingly integrating with diverse sectors such as
smart logistics, FinTech, entertainment, and cloud computing. This expansion has led to the …

Improving the Performance of DNN-based Software Services using Automated Layer Caching

M Abedi, Y Iouannou, P Jamshidi… - arXiv preprint arXiv …, 2022 - arxiv.org
Deep Neural Networks (DNNs) have become an essential component in many application
domains including web-based services. A variety of these services require high throughput …

Efficient and robust web scale language model based retrieval, generation, and understanding

DF Campos - 2023 - ideals.illinois.edu
Large language models effectively generate contextualized word representations across
languages, domains, and tasks. Drive by these abilities, these models have become a build …