Evaluation of job-scheduling strategies for grid computing

V Hamscher, U Schwiegelshohn, A Streit… - Grid Computing—GRID …, 2000 - Springer
In this paper, we discuss typical scheduling structures that occur in computational grids.
Scheduling algorithms and selection strategies applicable to these structures are introduced …

The distributed ASCI supercomputer project

H Bal, R Bhoedjang, R Hofman, C Jacobs… - ACM SIGOPS …, 2000 - dl.acm.org
The Distributed ASCI Supercomputer (DAS) is a homogeneous wide-area distributed system
consisting of four cluster computers at different locations. DAS has been used for research …

Dynamic load balancing in parallel execution of cellular automata

A Giordano, A De Rango, R Rongo… - … on Parallel and …, 2020 - ieeexplore.ieee.org
The allocation of the computational load across different processing elements is an
important issue in parallel computing. Indeed, an unbalanced load distribution can strongly …

Supporting internet-scale multi-agent systems

NJE Wijngaards, BJ Overeinder, M van Steen… - Data & Knowledge …, 2002 - Elsevier
The Internet provides a large-scale environment for (intelligent) software agents. Agents are
autonomous (mobile) processes, capable of communication with other agents, interaction …

[PDF][PDF] Transparent user-level checkpointing for the native posix thread library for linux.

M Rieker, J Ansel, G Cooperman - PDPTA, 2006 - people.csail.mit.edu
Checkpointing of single-threaded applications has been long studied [3],[6],[8],[12],[15].
Much less research has been done for user-level checkpointing of multithreaded …

A fault-tolerant hybrid resource allocation model for dynamic computational grid

S Sheikh, A Nagaraju, M Shahid - Journal of Computational Science, 2021 - Elsevier
Effectual allocation of resources with fault tolerance is one of the key targets in any
computational grid environment to accomplish the task execution on time. In this paper, a …

Current practice and a direction forward in checkpoint/restart implementations for fault tolerance

JC Sancho, F Petrini, K Davis… - 19th IEEE …, 2005 - ieeexplore.ieee.org
Checkpoint/restart is a general idea for which particular implementations enable various
functionalities in computer systems, including process migration, gang scheduling …

[图书][B] Coordinated checkpoint/restart process fault tolerance for MPI applications on HPC systems

J Hursey - 2010 - search.proquest.com
Scientists use advanced computing techniques to assist in answering the complex questions
at the forefront of discovery. The High Performance Computing (HPC) scientific applications …

User-level process checkpoint and restore for migration

M Bozyigit, M Wasiq - ACM SIGOPS Operating Systems Review, 2001 - dl.acm.org
In simple words, process checkpointing means saving the state of a process, so that, it can
be reconstructed in the future. Checkpointing followed by restore is important for the purpose …

Agent factory: Generative migration of mobile agents in heterogeneous environments

FMT Brazier, BJ Overeinder, M van Steen… - Proceedings of the …, 2002 - dl.acm.org
In most of today's agent systems migration of agents requires homogeneity in the
programming language and/or agent platform in which an agent has been designed. In this …