Contention awareness and fault-tolerant scheduling for precedence constrained tasks in heterogeneous systems

A Benoit, M Hakem, Y Robert - Parallel Computing, 2009 - Elsevier
Heterogeneous distributed systems are widely deployed for executing computationally
intensive parallel applications with diverse computing needs. Such environments require …

An intelligent management of fault tolerance in cluster using RADICMPI

A Duarte, D Rexachs, E Luque - European Parallel Virtual Machine …, 2006 - Springer
Independence of special elements, transparency and scalability are very significant features
required from the fault tolerance schemes for modern clusters of computers. In order to …

Optimizing latency and reliability of pipeline workflow applications

A Benoit, V Rehn-Sonigo… - 2008 IEEE International …, 2008 - ieeexplore.ieee.org
Mapping applications onto heterogeneous platforms is a difficult challenge, even for simple
application patterns such as pipeline graphs. The problem is even more complex when …

A fault tolerant approach in cluster computing system

T Shwe, W Aye - 2008 5th International Conference on …, 2008 - ieeexplore.ieee.org
A long-term trend in high performance computing is the increasing number of nodes in
parallel computing platforms, which entails a higher failure probability. Hence, fault …

[图书][B] RADIC: a powerful fault-tolerant architecture

AA Duarte - 2007 - ddd.uab.cat
La tolerancia a fallos se ha convertido en un requerimiento importante para los ingenieros
informáticos y los desarrolladores de software, debido a que la ocurrencia de fallos …

Who needs a scheduler?

A Benoit, L Marchal, Y Robert - 2008 - hal-lara.archives-ouvertes.fr
This position paper advocates the need for scheduling. Even if resources at our disposal
would become abundant and cheap, not to say unlimited and free (a~ perspective that is not …

Algorithms and scheduling techniques for clusters and grids

A Benoit, L Marchal, Y Robert… - High Speed and Large …, 2009 - ebooks.iospress.nl
The main objective of this chapter is to show the need for algorithmic and scheduling
techniques. Even if resources at our disposal would become abundant and cheap, not to say …

Functional tests of the RADIC fault tolerance architecture

A Duarte, D Rexachs, E Luque - … International Conference on …, 2007 - ieeexplore.ieee.org
Clusters with thousand of nodes are a reality and the current trend indicates that they are
becoming larger. Such large clusters are subject to a relatively high fault frequency so a fault …

Scheduling for numerical linear algebra library at scale

J Kurzak, H Ltaief, JJ Dongarra… - High Speed and Large …, 2009 - ebooks.iospress.nl
State-of-the-art dense linear algebra software, such as the LAPACK and ScaLAPACK
libraries, suffer performance losses on multicore processors due to their inability to fully …

Recuperando prestaciones en clusters tras ocurrencia de fallos utilizando RADIC

GA Santos, A Duarte… - … de Ciencias de la …, 2006 - sedici.unlp.edu.ar
Tras la recuperación de un fallo, las aplicaciones pierden prestaciones debido, en gran
parte, a que el número planificado de nodos ha disminuido y de la pérdida que provoca la …