DMTCP (distributed multithreaded checkpointing) is a transparent user-level checkpointing package for distributed applications. Checkpointing and restart is demonstrated for a wide …
BA Azeem, M Helal - 2014 9th International Conference on …, 2014 - ieeexplore.ieee.org
Distributed applications running on a large cluster environment, such as the cloud instances will have shorter execution time. However, the application might suffer from sudden …
High Performance Computing (HPC) systems have been widely used by scientists and researchers in both industry and university laboratories to solve advanced computation …
In this article, we propose a domain specific language for the" fault management for mission critical systems" domain that also supports rule-based operation. Variability management for …
A fault management framework has been developed where a rule-based event processing language is also developed that provides improvement to the existing approaches in terms …
Pesquisas em sistemas paralelos e distribuídos de alto desempenho apresentam limitações no que se refere a análise, projeto, implementação e execução automática e transparente …
BA Azeem, M Helal - arXiv preprint arXiv:2311.17545, 2023 - arxiv.org
Distributed applications running on a large cluster environment, such as the cloud instances will have shorter execution time. However, the application might suffer from sudden …
Content Centric Network is a proposed future networking paradigm where data is the central entity for communication and the correspondence model follows two-step approach for data …
I Ljubuncic, A Rozenfeld, A Goldis, R Giri - ieee-hpec.org
Intel's chip design run in a large-scale globally distributed environment with 600,000 cores. In the current semiconductor market scenario, a combination of factors such as time to …