Service recovery for large scale distributed publish and subscription services for cyber-physical systems and disaster management

CS Shih, HY Chen, ZY Yeh - 2014 IEEE International …, 2014 - ieeexplore.ieee.org
CS Shih, HY Chen, ZY Yeh
2014 IEEE International Conference on Cyber-Physical Systems …, 2014ieeexplore.ieee.org
Information and communication technology (ICT) played a critical role in disaster
management in last few decades. One example is the messaging service for disaster alerts,
rescue workers, and victims. Many of these messaging services are developing based on
existing messaging services and in ad hoc manner to meet the communication requirements
in different disaster scenario. However, most, if not all, existing messaging services are
designed under the assumption that the underlying network infrastructures are mostly …
Information and communication technology (ICT) played a critical role in disaster management in last few decades. One example is the messaging service for disaster alerts, rescue workers, and victims. Many of these messaging services are developing based on existing messaging services and in ad hoc manner to meet the communication requirements in different disaster scenario. However, most, if not all, existing messaging services are designed under the assumption that the underlying network infrastructures are mostly reliable. Unfortunately, this assumption is not valid during and after disasters. In this work, we designed and implemented a service recovery framework, including a landmark-based/centralized algorithm and a distributed algorithm, for publish/subscribe messaging services for disaster management. The developed mechanisms recover a failed service without manual efforts. The centralized algorithm uses a landmark node to monitor the services and to recover the failed one, the distributed algorithm is a Paxos-based algorithm to compile a consistent recovery plan among nodes, monitoring the failed service. We evaluated the performance for these two mechanisms, and discussed the proper use scenario for these two mechanisms. The results show that the centralized algorithm should only be used in a service network having no concurrent failure within a local area network, the distributed algorithm are neither sensitive to concurrent failures nor the size of service networks.
ieeexplore.ieee.org
以上显示的是最相近的搜索结果。 查看全部搜索结果