… soft errors are linked to several temporal and spatial features, … As we progressing towards exascale, applications are going to … resilience in GPUs and demonstrate that the ratio of silent …
… that exascale systems will experience faults and errors more fre… However, silent data corruption (SDC) might require more … a global memory address space partitioned across the nodes. …
A Müller, W Deconinck, C Kühnlein… - Geoscientific Model …, 2019 - gmd.copernicus.org
… ESCAPE strategy: (i) identify domain-specific key algorithmic … the optimisations, whereas the error measures verify to what … most efficiently in grid point space, while horizontal gradients…
… are designed/utilized towards exascale computing, inevitably … on one-day period (27-March-2017) due to space limit and … using partial labels based on PU learning and SupportVector …
… be more susceptible to soft errors ,eg silent data corruptions, … is able to detecterrors online soon after the error occurs so … for timely detection, faster recovery and less space overhead …
… different ML algorithms (eg, supportvector machines, k-Nearest … To inject faults and check for errors, this work introduces … investigations during early design space explorations process, …
A Bustos, AJ Rubio-Montero, R Méndez… - The Journal of …, 2023 - Springer
… silent data corruption detectors by leveraging supportvector … The right replication level to detect and correct silenterrors at … their spatial locality and providing the mean relative error (…
… scientists with expertise in exascale computing to discuss novel … for detection, containment and mitigation of silent data … -resolution in space or time and the error estimators themselves …
D Jauk, D Yang, M Schulz - … of the International Conference for High …, 2019 - dl.acm.org
… As we near exascale, resilience remains a major technical … spatial and temporal correlation of memory errors to identify … RIPPER and the supportvectormachine perform reasonably well …