[PDF][PDF] Algorithm-based fault-tolerant parallel sorting

ET Camargo, EPD Junior - inf.ufpr.br
High performance computing (HPC) systems often require substantial resources, and can
take up to several hours or days to execute. Upon a failure, it is important to loose as little …

Compiler and system for resilient distributed heterogeneous graph analytics

GS Gill - 2020 - repositories.lib.utexas.edu
Graph analytics systems are used in a wide variety of applications including health care,
electronic circuit design, machine learning, and cybersecurity. Graph analytics systems must …

[图书][B] Application-based Focused Recovery (ABFR): Convenient Management of Latent Error Resilience Using Application Knowledge

A Fang - 2018 - search.proquest.com
Supercomputers continue to increase in scale and complexity to meet the demands of
science and engineering. Exascale systems face high error rates due to increasing scale (10 …

Lazy Fault Recovery for Redundant MPI

E Saliba - 2019 - search.proquest.com
Distributed Systems (DS) where multiple computers share a workload across a network, are
used everywhere, from data intensive computations to storage and machine learning. DS …

A fault tolerant high-performance reduction framework in complex environment

LI Chao, Z Changhai, YAN Haihua… - 北京航空航天大学 …, 2018 - bhxb.buaa.edu.cn
Reduction is one of the most commonly used collective communication operations for
parallel applications. There are two problems for the existing reduction algorithms: First, they …

[PDF][PDF] Técnicas para a Construção de Sistemas MPI To-lerantes a Falhas

ET Camargo, EP Duarte Jr - tsi.td.utfpr.edu.br
O MPI é um dos principais padrões para o desenvolvimento de aplicações paralelas e
distribuídas baseadas no paradigma de troca de mensagens. Diversos sistemas de …

Algorithm Based Fault Tolerance: A Perspective from Algorithmic and Communication Characteristics of Parallel Algorithms

U Kabir - 2017 - spectrum.library.concordia.ca
Checkpoint and recovery cost imposed by checkpoint/restart (CP/R) is a crucial performance
issue for high-performance computing (HPC) applications. In comparison, Algorithm-Based …

Parallelization and fault tolerance supporting near-real-time data processing for bioremediation

D Hakkarinen - 2013-Mines Theses & Dissertations, 2013 - repository.mines.edu
The push toward interdisciplinary research has expanded the interactions and complexity of
research tasks. In the course of this dissertation, we explore three separate, but ultimately …

Notice of Retraction: Differences of Competitive Sports Views in Chinese and Western Sports Culture

D Yang, H Luo, C Gang - … on e-Education, e-Business, e …, 2010 - ieeexplore.ieee.org
Notice of Retraction<BR>Differences of Competitive Sports Views in Chinese and Western
Sports Culture Page 1 Notice of Retraction After careful and considered review of the content of …