Software fault tolerance in real-time systems: Identifying the future research questions

F Reghenzani, Z Guo, W Fornaciari - ACM Computing Surveys, 2023 - dl.acm.org
Tolerating hardware faults in modern architectures is becoming a prominent problem due to
the miniaturization of the hardware components, their increasing complexity, and the …

An efficient forecasting approach for resource utilization in cloud data center using CNN-LSTM model

S Ouhame, Y Hadi, A Ullah - Neural Computing and Applications, 2021 - Springer
Cloud computing provides different kind of services for users and provides with the help of
internet. The Infrastructure as a service is a service model that provides virtual computing …

A survey of fault-tolerance techniques for embedded systems from the perspective of power, energy, and thermal issues

S Safari, M Ansari, H Khdr, P Gohari-Nazari… - IEEE …, 2022 - ieeexplore.ieee.org
The relentless technology scaling has provided a significant increase in processor
performance, but on the other hand, it has led to adverse impacts on system reliability. In …

Impact of voltage scaling on soft errors susceptibility of multicore server cpus

D Agiakatsikas, G Papadimitriou, V Karakostas… - Proceedings of the 56th …, 2023 - dl.acm.org
Microprocessor power consumption and dependability are both crucial challenges that
designers have to cope with due to shrinking feature sizes and increasing transistor counts …

Comparative analysis of soft-error sensitivity in LU decomposition algorithms on diverse GPUs

G Leon, JM Badia, JA Belloch, A Lindoso… - The Journal of …, 2024 - Springer
Graphics processing units (GPUs) have become integral to embedded systems and
supercomputing centres due to their large memory, cutting-edge technology and high …

Response of HPC hardware to neutron radiation at the dawn of exascale

A Bustos, AJ Rubio-Montero, R Méndez… - The Journal of …, 2023 - Springer
Every computation presents a small chance that an unexpected phenomenon ruins or
modifies its output. Computers are prone to errors that, although may be very unlikely, are …

Penelope: peer-to-peer power management

T Srivastava, H Zhang, H Hoffmann - Proceedings of the 51st …, 2022 - dl.acm.org
Large scale distributed computing setups rely on power management systems to enforce
tight power budgets. Existing systems use a central authority that redistributes excess power …

[PDF][PDF] On evaluation of reliability increase in fault-tolerant multiprocessor systems

ВО Романкевич, КВ Морозов, АП Фесенюк… - Прикладні аспекти …, 2024 - aait.od.ua
The work is devoted to the problem of evaluating the reliability increase of a fault-tolerant
multiprocessor system by adding an extra processor to the system. It is assumed that the …

Calculation of the high-energy neutron flux for anticipating errors and recovery techniques in exascale supercomputer centres

H Asorey, R Mayo-Garcia - The Journal of Supercomputing, 2023 - Springer
The age of exascale computing has arrived, and the risks associated with neutron and other
atmospheric radiation are becoming more critical as the computing power increases; hence …

To improve scalability with Boolean matrix using efficient gossip failure detection and consensus algorithm for PeerSim simulator in IoT environment

S Kumar, JK Samriya, AS Yadav, M Kumar - International Journal of …, 2022 - Springer
Recent years have seen a rise in interest in peer-to-peer applications because of their
scalability, fault tolerance, and other noteworthy benefits. The main benefit of P2P …