Rowpress: Amplifying read disturbance in modern dram chips

H Luo, A Olgun, AG Yağlıkçı, YC Tuğrul… - Proceedings of the 50th …, 2023 - dl.acm.org
Memory isolation is critical for system reliability, security, and safety. Unfortunately, read
disturbance can break memory isolation in modern DRAM chips. For example, RowHammer …

Artificial neural networks for space and safety-critical applications: Reliability issues and potential solutions

P Rech - IEEE Transactions on Nuclear Science, 2024 - ieeexplore.ieee.org
Machine learning is among the greatest advancements in computer science and
engineering and is today used to classify or detect objects, a key feature in autonomous …

Understanding and mitigating hardware failures in deep learning training systems

Y He, M Hutton, S Chan, R De Gruijl… - Proceedings of the 50th …, 2023 - dl.acm.org
Deep neural network (DNN) training workloads are increasingly susceptible to hardware
failures in datacenters. For example, Google experienced" mysterious, difficult to identify …

[HTML][HTML] Understanding fault-tolerance vulnerabilities in advanced SoC FPGAs for critical applications

N Cherezova, K Shibin, M Jenihhin, A Jutman - Microelectronics Reliability, 2023 - Elsevier
The emergence of heterogeneous FPGA-based SoCs and their growing complexity fueled
by the introduction of various accelerators bring the reliability aspect of these systems to the …

Avgi: Microarchitecture-driven, fast and accurate vulnerability assessment

G Papadimitriou, D Gizopoulos - 2023 IEEE International …, 2023 - ieeexplore.ieee.org
We propose AVGI, a new Statistical Fault Injection (SFI)-based methodology, which delivers
orders of magnitude faster assessment of the Architectural Vulnerability Factor (AVF) of a …

Silent data corruptions: Microarchitectural perspectives

G Papadimitriou, D Gizopoulos - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
Today more than ever before, academia, manufacturers, and hyperscalers acknowledge the
major challenge of silent data corruptions (SDCs) and aim on solutions to minimize its …

Fast and accurate error simulation for cnns against soft errors

C Bolchini, L Cassano, A Miele… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org
The great quest for adopting AI-based computation for safety-/mission-critical applications
motivates the interest towards methods for assessing the robustness of the application wrt …

Impact of voltage scaling on soft errors susceptibility of multicore server cpus

D Agiakatsikas, G Papadimitriou, V Karakostas… - Proceedings of the 56th …, 2023 - dl.acm.org
Microprocessor power consumption and dependability are both crucial challenges that
designers have to cope with due to shrinking feature sizes and increasing transistor counts …

Understanding the Effects of Permanent Faults in GPU's Parallelism Management and Control Units

JD Guerrero Balaguera, JE Rodriguez Condia… - Proceedings of the …, 2023 - dl.acm.org
Modern Graphics Processing Units (GPUs) demand life expectancy extended to many years,
exposing the hardware to aging (ie, permanent faults arising after the end-of-manufacturing …

Saca-FI: A microarchitecture-level fault injection framework for reliability analysis of systolic array based CNN accelerator

J Tan, Q Wang, K Yan, X Wei, X Fu - Future Generation Computer Systems, 2023 - Elsevier
As convolutional neural network CNN accelerators are being adopted in emerging safety-
critical areas, their reliability becomes prominent. The systolic array is widely used as the …