Exploring Winograd convolution for cost-effective neural network fault tolerance

X Xue, C Liu, B Liu, H Huang, Y Wang… - … Transactions on Very …, 2023 - ieeexplore.ieee.org
Winograd is generally utilized to optimize convolution performance and computational
efficiency because of the reduced multiplication operations, but the reliability issues brought …

Towards resilient and energy efficient scalable Krylov solvers

Z Miao, JC Calhoun, R Ge - Parallel Computing, 2025 - Elsevier
Exascale computing must simultaneously address both energy efficiency and resilience as
power limits impact scalability and faults are more common. Unfortunately, energy efficiency …

Guser: A GPGPU Power Stressmark Generator

Y Shan, Y Yang, X Qian, Z Yu - 2024 IEEE International …, 2024 - ieeexplore.ieee.org
Power stress mark is crucial for estimating Thermal Design Power (TDP) of GPGPUs to
ensure efficient power control. This paper proposes Guser, the first systematic methodology …

Light-Weight Fault Tolerant Attention for Large Language Model Training

Y Liang, X Li, J Ren, A Li, B Fang, J Chen - arXiv preprint arXiv …, 2024 - arxiv.org
Large Language Models (LLMs) have demonstrated remarkable performance in various
natural language processing tasks. However, the training of these models is computationally …