Addressing failures in exascale computing

M Snir, RW Wisniewski, JA Abraham… - … Journal of High …, 2014 - journals.sagepub.com
We present here a report produced by a workshop on 'Addressing failures in exascale
computing'held in Park City, Utah, 4–11 August 2012. The charter of this workshop was to …

SHARPE at the age of twenty two

KS Trivedi, R Sahner - ACM SIGMETRICS Performance Evaluation …, 2009 - dl.acm.org
This paper discusses the modeling tool called SHARPE (Symbolic Hierarchical Automated
Reliability and Performance Evaluator), a general hierarchical modeling tool that analyzes …

Predicting node failures in an ultra-large-scale cloud computing platform: an aiops solution

Y Li, ZM Jiang, H Li, AE Hassan, C He… - ACM Transactions on …, 2020 - dl.acm.org
Many software services today are hosted on cloud computing platforms, such as Amazon
EC2, due to many benefits like reduced operational costs. However, node failures in these …

Software aging analysis of the linux operating system

D Cotroneo, R Natella, R Pietrantuono… - 2010 IEEE 21st …, 2010 - ieeexplore.ieee.org
Software systems running continuously for a long time tend to show degrading performance
and an increasing failure occurrence rate, due to error conditions that accrue over time and …

Pets 2016: Dataset and challenge

L Patino, T Cane, A Vallee… - Proceedings of the IEEE …, 2016 - cv-foundation.org
This paper describes the datasets and computer vision challenges that form part of the PETS
2016 workshop. PETS 2016 addresses the application of on-board multi sensor surveillance …

Automated performance analysis of load tests

ZM Jiang, AE Hassan, G Hamann… - 2009 IEEE International …, 2009 - ieeexplore.ieee.org
The goal of a load test is to uncover functional and performance problems of a system under
load. Performance problems refer to the situations where a system suffers from unexpectedly …

Visual analysis of complex networks for business intelligence with gephi

S Heymann, B Le Grand - 2013 17th International Conference …, 2013 - ieeexplore.ieee.org
Platforms which combine data mining algorithms and interactive visualizations play a key
role in the discovery process from complex networks data, eg Web and Online Social …

A quantitative approach for the assessment of microservice architecture deployment alternatives by automated performance testing

A Avritzer, V Ferme, A Janes, B Russo, H Schulz… - … Conference on Software …, 2018 - Springer
Microservices have emerged as an architectural style for developing distributed
applications. Assessing the performance of architectural deployment alternatives is …

Software aging and rejuvenation: Where we are and where we are going

D Cotroneo, R Natella, R Pietrantuono… - 2011 IEEE Third …, 2011 - ieeexplore.ieee.org
After 16 years, a significant body of knowledge has been established in the area of Software
Aging and Rejuvenation (SAR). In this paper, we survey papers about SAR that appeared in …

Proactive process-level live migration and back migration in HPC environments

C Wang, F Mueller, C Engelmann, SL Scott - Journal of Parallel and …, 2012 - Elsevier
As the number of nodes in high-performance computing environments keeps increasing,
faults are becoming common place. Reactive fault tolerance (FT) often does not scale due to …