Evaluating energy savings for checkpoint/restart

HM Sun, ST Chen, JH Yeh… - IEEE Transactions on …, 2016 - ieeexplore.ieee.org

Authentication based on passwords is used largely in applications for computer security and
privacy. However, human actions such as choosing bad passwords and inputting passwords …

被引用次数：143 相关文章所有 11 个版本

[PDF] ieee.org

Elastic reliability optimization through peer-to-peer checkpointing in cloud computing

J Zhao, Y Xiang, T Lan, HH Huang… - IEEE Transactions on …, 2016 - ieeexplore.ieee.org

Modern day data centers coordinate hundreds of thousands of heterogeneous tasks and
aim at delivering highly reliable cloud computing services. Although offering equal reliability …

被引用次数：36 相关文章所有 5 个版本

[PDF] toronto.edu

To checkpoint or not to checkpoint: Understanding energy-performance-i/o tradeoffs in hpc checkpointing

N El-Sayed, B Schroeder - 2014 IEEE International Conference …, 2014 - ieeexplore.ieee.org

As the scale of high-performance computing (HPC) clusters continues to grow, their
increasing failure rates and energy consumption levels are emerging as two serious design …

被引用次数：38 相关文章所有 4 个版本

Power-check: An energy-efficient checkpointing framework for HPC clusters

RR Chandrasekar, A Venkatesh… - 2015 15th IEEE/ACM …, 2015 - ieeexplore.ieee.org

Checkpoint-restart is a predominantly used reactive fault-tolerance mechanism for
applications running on HPC systems. While there are innumerable studies in literature that …

被引用次数：32 相关文章所有 4 个版本

Understanding practical tradeoffs in HPC checkpoint-scheduling policies

N El-Sayed, B Schroeder - IEEE Transactions on Dependable …, 2016 - ieeexplore.ieee.org

As the scale of High-Performance Computing (HPC) clusters continues to grow, their
increasing failure rates and energy consumption levels are emerging as serious design …

被引用次数：23 相关文章所有 2 个版本

Towards resilient and energy efficient scalable Krylov solvers

Z Miao, JC Calhoun, R Ge - Parallel Computing, 2025 - Elsevier

Exascale computing must simultaneously address both energy efficiency and resilience as
power limits impact scalability and faults are more common. Unfortunately, energy efficiency …

[PDF] ieee.org

Prediction of energy consumption by checkpoint/restart in hpc

M Morán, J Balladini, D Rexachs, E Luque - IEEE Access, 2019 - ieeexplore.ieee.org

The fault tolerance method most used today in high-performance computing (HPC) is
coordinated checkpointing. This, like any other fault tolerance method, adds additional …

被引用次数：13 相关文章所有 5 个版本

[PDF] osti.gov

High Performance Computing-Power Application Programming Interface Specification Version 2.0.

JH Laros, R Grant, MJ Levenhagen, SL Olivier… - 2017 - osti.gov

Measuring and controlling the power and energy consumption of high performance
computing systems by various components in the software stack is an active research area …

被引用次数：23 相关文章所有 4 个版本

[PDF] sciencedirect.com

Energy-efficient checkpointing in high-throughput cycle-stealing distributed systems

M Forshaw, AS McGough, N Thomas - Electronic Notes in Theoretical …, 2015 - Elsevier

Checkpointing is a fault-tolerance mechanism commonly used in High Throughput
Computing (HTC) environments to allow the execution of long-running computational tasks …

被引用次数：20 相关文章所有 21 个版本

[PDF] nsf.gov

Energy analysis and optimization for resilient scalable linear systems

Z Miao, J Calhoun, R Ge - 2018 IEEE International Conference …, 2018 - ieeexplore.ieee.org

Exascale computing must simultaneously address both energy efficiency and resilience as
power limits impact scalability and faults are more common. Unfortunately, energy efficiency …

被引用次数：11 相关文章所有 3 个版本

高级搜索

QQ 群