Service placement for detecting and localizing failures using end-to-end observations

T He, N Bartolini, H Khamfroush, IJ Kim… - 2016 IEEE 36th …, 2016 - ieeexplore.ieee.org
We consider the problem of placing services in a telecommunication network in the
presence of failures. In contrast to existing service placement algorithms that focus on …

Non-deterministic diagnosis of end-to-end service failures in a multi-layer communication system

M Steinder, AS Sethi - Proceedings Tenth International …, 2001 - ieeexplore.ieee.org
Fault localization is a process of isolating faults responsible for the observable
malfunctioning of the managed system. Previously, fault localization efforts concentrated …

Service failure diagnosis in service function chain

S Zhang, Y Wang, W Li, X Qiu - 2017 19th Asia-Pacific Network …, 2017 - ieeexplore.ieee.org
Network function virtualization (NFV) is a powerful emerging technique with widespread
applicability. It provides Network Functions (NFs) through software virtualization techniques …

Network capability in localizing node failures via end-to-end path measurements

L Ma, T He, A Swami, D Towsley… - IEEE/ACM transactions …, 2016 - ieeexplore.ieee.org
We investigate the capability of localizing node failures in communication networks from
binary states (normal/failed) of end-to-end paths. Given a set of nodes of interest, uniquely …

A trace-log-clusterings-based fault localization approach to microservice systems

CA Sun, T Zeng, W Zuo, H Liu - 2023 IEEE International …, 2023 - ieeexplore.ieee.org
Microservice architecture has been widely used for the development of large-scale
distributed applications. Microservice systems normally have high complexity and loose …

Shedding light on enterprise network failures using spotlight

D John, P Prakash, RR Kompella… - 2010 29th IEEE …, 2010 - ieeexplore.ieee.org
Fault localization in enterprise networks is extremely challenging. A recent approach called
Sherlock makes some headway into this problem by using an inference algorithm over a …

Network topology inference with partial information

B Holbert, S Tati, S Silvestri… - IEEE Transactions on …, 2015 - ieeexplore.ieee.org
Full knowledge of the routing topology of the Internet is useful for a multitude of network
management tasks. However, the full topology is often not known and is instead estimated …

Lifeguard: Local health awareness for more accurate failure detection

A Dadgar, J Phillips, J Currey - 2018 48th Annual IEEE/IFIP …, 2018 - ieeexplore.ieee.org
SWIM is a peer-to-peer group membership protocol, with attractive scaling and robustness
properties. However, our experience supporting an implementation of SWIM shows that a …

Towards more effective and explainable fault management using cross-layer service topology

DR Mathews, M Verma, J Lakshmi… - 2022 IEEE 15th …, 2022 - ieeexplore.ieee.org
As microservice architecture becomes prominent, existing fault management techniques to
deal with service disruption become limiting mainly due to the amount of data needed to be …

Localizing and explaining faults in microservices using distributed tracing

J Rios, S Jha, L Shwartz - 2022 IEEE 15th International …, 2022 - ieeexplore.ieee.org
Finding the exact location of a fault in a large distributed microservices application running
in containerized cloud environments can be very difficult and time-consuming. We present a …