Fundamentals of fault-tolerant distributed computing in asynchronous environments

FC Gärtner - ACM Computing Surveys (CSUR), 1999 - dl.acm.org
Fault tolerance in distributed computing is a wide area with a significant body of literature
that is vastly diverse in methodology and terminology. This paper aims at structuring the …

A comprehensive survey on internet outages

G Aceto, A Botta, P Marchetta, V Persico… - Journal of Network and …, 2018 - Elsevier
Internet outages are inevitable, frequent, opaque, and expensive. To make things worse,
they are poorly understood, while a deep understanding of them is essential for …

[图书][B] Computer architecture: a quantitative approach

JL Hennessy, DA Patterson - 2011 - books.google.com
Computer Architecture: A Quantitative Approach, Fifth Edition, explores the ways that
software and technology in the cloud are accessed by digital media, such as cell phones …

Resilience and survivability in communication networks: Strategies, principles, and survey of disciplines

JPG Sterbenz, D Hutchison, EK Çetinkaya, A Jabbar… - Computer …, 2010 - Elsevier
The Internet has become essential to all aspects of modern life, and thus the consequences
of network disruption have become increasingly severe. It is widely recognised that the …

Robustness and evolvability in living systems

A Wagner - 2013 - torrossa.com
Living things are unimaginably complex, yet they have withstood a withering assault of
harmful influences over several billion years. These influences include cataclysmic changes …

Why do Internet services fail, and what can be done about it?

D Oppenheimer, A Ganapathi… - 4th Usenix Symposium on …, 2003 - usenix.org
In 1986 Jim Gray published his landmark study of the causes of failures of Tandem systems
and the techniques Tandem used to prevent such failures See J. Gray. Why do computers …

Restoration of services in interdependent infrastructure systems: A network flows approach

EE Lee II, JE Mitchell… - IEEE Transactions on …, 2007 - ieeexplore.ieee.org
Modern society depends on the operations of civil infrastructure systems, such as
transportation, energy, telecommunications, and water. These systems have become so …

Quantitative evaluation of DC microgrids availability: Effects of system architecture and converter topology design choices

A Kwasinski - IEEE Transactions on Power Electronics, 2010 - ieeexplore.ieee.org
This paper presents a quantitative method to evaluate dc microgrids availability by
identifying and calculating minimum cut sets occurrence probability for different microgrid …

Evaluation of network resilience, survivability, and disruption tolerance: analysis, topology generation, simulation, and experimentation

JPG Sterbenz, EK Çetinkaya, MA Hameed… - Telecommunication …, 2013 - Springer
As the Internet becomes increasingly important to all aspects of society, the consequences
of disruption become increasingly severe. Thus it is critical to increase the resilience and …

Improving the Reliability of Internet Paths with One-hop Source Routing.

PK Gummadi, HV Madhyastha, SD Gribble, HM Levy… - OSDI, 2004 - usenix.org
Improving the Reliability of Internet Paths with One-hop Source Routing Page 1 Improving the
Reliability of Internet Paths with One-hop Source Routing Krishna P. Gummadi, Harsha V …