Survey on grid resource allocation mechanisms

MB Qureshi, MM Dehnavi, N Min-Allah… - Journal of Grid …, 2014 - Springer
Grid is a distributed high performance computing paradigm that offers various types of
resources (like computing, storage, communication) to resource-intensive user tasks. These …

Backfilling using system-generated predictions rather than user runtime estimates

D Tsafrir, Y Etsion, DG Feitelson - IEEE Transactions on …, 2007 - ieeexplore.ieee.org
The most commonly used scheduling algorithm for parallel supercomputers is FCFS with
backfilling, as originally introduced in the EASY scheduler. Backfilling means that short jobs …

Optimal scheduling in the multiserver-job model under heavy traffic

I Grosof, Z Scully, M Harchol-Balter… - Proceedings of the ACM …, 2022 - dl.acm.org
Multiserver-job systems, where jobs require concurrent service at many servers, occur
widely in practice. Essentially all of the theoretical work on multiserver-job systems focuses …

The RESET and MARC techniques, with application to multiserver-job analysis

I Grosof, Y Hong, M Harchol-Balter… - Performance …, 2023 - Elsevier
Abstract Multiserver-job (MSJ) systems, where jobs need to run concurrently across many
servers, are increasingly common in practice. The default service ordering in many settings …

Modeling user runtime estimates

D Tsafrir, Y Etsion, DG Feitelson - … 2005, Cambridge, MA, USA, June 19 …, 2005 - Springer
User estimates of job runtimes have emerged as an important component of the workload on
parallel machines, and can have a significant impact on how a scheduler treats different …

Priority-based consolidation of parallel workloads in the cloud

X Liu, C Wang, BB Zhou, J Chen… - … on Parallel and …, 2012 - ieeexplore.ieee.org
The cloud computing paradigm is attracting an increased number of complex applications to
run in remote data centers. Many complex applications require parallel processing …

Distributed job manager recovery

JR Challenger, LR Degenaro, JR Giles… - US Patent …, 2010 - Google Patents
(57) ABSTRACT A method is provided for the recovery of an instance of a job manager
running on one of a plurality of nodes used to execute the processing elements associated …

Exploring decentralized dynamic scheduling for grids and clouds using the community-aware scheduling algorithm

Y Huang, N Bessis, P Norrington, P Kuonen… - Future Generation …, 2013 - Elsevier
Job scheduling strategies have been studied for decades in a variety of scenarios. Due to
the new characteristics of the emerging computational systems, such as the grid and cloud …

Towards understanding HPC users and systems: a NERSC case study

GP Rodrigo, PO Östberg, E Elmroth, K Antypas… - Journal of Parallel and …, 2018 - Elsevier
High performance computing (HPC) scheduling landscape currently faces new challenges
due to the changes in the workload. Previously, HPC centers were dominated by tightly …

[HTML][HTML] A machine learning approach for an HPC use case: The jobs queuing time prediction

C Vercellino, A Scionti, G Varavallo, P Viviani… - Future Generation …, 2023 - Elsevier
Abstract High-Performance Computing (HPC) domain provided the necessary tools to
support the scientific and industrial advancements we all have seen during the last decades …