Fast and accurate online video object segmentation via tracking parts

J Cheng, YH Tsai, WC Hung… - Proceedings of the …, 2018 - openaccess.thecvf.com
Online video object segmentation is a challenging task as it entails to process the image
sequence timely and accurately. To segment a target object through the video, numerous …

Posits and the state of numerical representations in the age of exascale and edge computing

A Poulos, SA McKee… - Software: Practice and …, 2022 - Wiley Online Library
Growing constraints on memory utilization, power consumption, and I/O throughput have
increasingly become limiting factors to the advancement of high performance computing …

Mitigating silent data corruptions in HPC applications across multiple program inputs

Y Huang, S Guo, S Di, G Li… - … Conference for High …, 2022 - ieeexplore.ieee.org
With the ever-shrinking size of transistors, silent data corruptions (SDCs) are becoming a
common yet serious issue in HPC. Selective instruction duplication (SID) is a widely used …

A tale of two injectors: End-to-end comparison of ir-level and assembly-level fault injection

L Palazzi, G Li, B Fang… - 2019 IEEE 30th …, 2019 - ieeexplore.ieee.org
Fault injection (FI) is a commonly used experimental technique to evaluate the resilience of
software techniques for tolerating hardware faults. Software-implemented FI can be …

Fault tolerant one-sided matrix decompositions on heterogeneous systems with gpus

J Chen, H Li, S Li, X Liang, P Wu, D Tao… - … Conference for High …, 2018 - ieeexplore.ieee.org
Current algorithm-based fault tolerance (ABFT) approach for one-sided matrix
decomposition on heterogeneous systems with GPUs have following limitations:(1) they do …

Enabling Effective Error Mitigation in Memory Chips That Use On-Die Error-Correcting Codes

M Patel - arXiv preprint arXiv:2204.10387, 2022 - arxiv.org
Improvements in main memory storage density are primarily driven by process technology
scaling, which negatively impacts reliability by exacerbating various circuit-level error …

ApproxDup: Developing an Approximate Instruction Duplication Mechanism for Efficient SDC Detection in GPGPUs

X Wei, N Jiang, H Yue, X Wang, J Zhao… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
Nowadays, selective instruction duplication (SelDup) is the typical approach to detect silent
data corruption (SDC) in GPGPU. However, owing to the up-to-billions fault sites of parallel …

Exploring non-volatility of non-volatile memory for high performance computing under failures

J Ren, K Wu, D Li - 2020 IEEE International Conference on …, 2020 - ieeexplore.ieee.org
Hardware failures and faults often result in application crash in HPC. The emergence of non-
volatile memory (NVM) provides a solution to address this problem. Leveraging the …

Mitigating virtualization failures through migration to a co-located hypervisor

F Cerveira, R Barbosa, H Madeira - IEEE Access, 2021 - ieeexplore.ieee.org
Many organizations are moving their systems to the cloud, where providers consolidate
multiple clients using virtualization, which creates challenges to business-critical …

Improving application resilience by extending error correction with contextual information

A Poulos, D Wallace, R Robey… - 2018 IEEE/ACM 8th …, 2018 - ieeexplore.ieee.org
Extreme-scale systems are growing in scope and complexity as we approach exascale.
Uncorrectable faults in such systems are also increasing, so resilience efforts addressing …