Slurm simulator: Improving slurm scheduler performance on large hpc systems by utilization...

FV Zacarias, V Petrucci, R Nishtala, P Carpenter… - Journal of Parallel and …, 2021 - Elsevier

Many HPC applications suffer from a bottleneck in the shared caches, instruction execution
units, I/O or memory bandwidth, even though the remaining resources may be underutilized …

被引用次数：14 相关文章所有 6 个版本

Exploring job running path to predict runtime on multiple production supercomputers

W Yang, X Liao, D Dong, J Yu - Journal of Parallel and Distributed …, 2023 - Elsevier

There are massive jobs submitted in the supercomputer, and the job management system is
typically deployed to schedule these jobs and allocate compute resources. FCFS (First …

被引用次数：3 相关文章所有 2 个版本

[PDF] researchgate.net

Intelligent colocation of workloads for enhanced server efficiency

FV Zacarias, V Petrucci, R Nishtala… - 2019 31st …, 2019 - ieeexplore.ieee.org

Many server applications achieve only a fraction of their theoretical peak performance due to
bottlenecks in the shared caches, instruction execution units, I/O or memory bandwidth, even …

被引用次数：12 相关文章所有 4 个版本

[PDF] nsf.gov

Quantifying server memory frequency margin and using it to improve performance in hpc systems

D Zhang, G Panwar, JB Kotra… - 2021 ACM/IEEE 48th …, 2021 - ieeexplore.ieee.org

To maintain strong reliability, memory manufacturers label server memories at much slower
data rates than the highest data rates at which they can still operate correctly for most (eg …

被引用次数：8 相关文章所有 8 个版本

Exploring the tradeoff between reliability and performance in hpc systems

C Walker, B Slade, G Bailey… - 2021 IEEE High …, 2021 - ieeexplore.ieee.org

Evaluating the trade-off space between performance and reliability is important for data
center operators as part of their supercomputer procurement, planning and acceptance …

被引用次数：5 相关文章所有 2 个版本

[PDF] upv.es

[PDF][PDF] Optimized hardware configuration for high performance computing systems

S Hutchison, D Andresen, W Hsu… - Proceedings of the …, 2023 - personales.upv.es

When faced with upgrading or replacing High Performance Computing or High Throughput
Computing systems, system administrators can be overwhelmed by hardware options …

被引用次数：2 相关文章

[PDF] acm.org

Developing accurate slurm simulator

NA Simakov, RL Deleon, Y Lin, PS Hoffmann… - … and Experience in …, 2022 - dl.acm.org

A new Slurm simulator compatible with the latest Slurm version has been produced. It was
constructed by systematically transforming the Slurm code step by step to maintain the …

被引用次数：2 相关文章所有 4 个版本

[PDF] uni.lu

Optimizing the Resource and Job Management System of an Academic HPC & Research Computing Facility

S Varrette, E Kieffer, F Pinel - 2022 21st International …, 2022 - ieeexplore.ieee.org

High Performance Computing (HPC) is nowadays a strategic asset required to sustain the
surging demands for massive processing and data-analytic capabilities. In practice, the …

被引用次数：2 相关文章所有 6 个版本

[PDF] arxiv.org

A resourceful coordination approach for multilevel scheduling

A Eleliemy, FM Ciorba - arXiv preprint arXiv:2103.05809, 2021 - arxiv.org

HPC users aim to improve their execution times without particular regard for increasing
system utilization. On the contrary, HPC operators favor increasing the number of executed …

被引用次数：3 相关文章所有 2 个版本

DeletePop: A DLT Execution Time Predictor Based on Comprehensive Modeling

Y He, Y Zhou, E Shao, G Tan, N Sun - International Conference on …, 2023 - Springer

The modeling and simulation of Deep Learning Training (DLT) are challenging problems.
Due to the intricate parallel patterns, existing modelings and simulations do not consider …

高级搜索

QQ 群