Y Liang, Y Zhang, H Xiong… - … Conference on Data …, 2007 - ieeexplore.ieee.org
Frequent failures are becoming a serious concern to the community of high-end computing, especially when the applications and the underlying systems rapidly grow in size and …
AB Nagarajan, F Mueller, C Engelmann… - Proceedings of the 21st …, 2007 - dl.acm.org
Large-scale parallel computing is relying increasingly on clusters with thousands of processors. At such large counts of compute nodes, faults are becoming common place …
The growing computational and storage needs of several scientific applications mandate the deployment of extreme-scale parallel machines, such as IBM's BlueGene/L which can …
As the number of nodes in high-performance computing environments keeps increasing, faults are becoming common place. Reactive fault tolerance (FT) often does not scale due to …
X Chen, CD Lu, K Pattabiraman - 2014 IEEE 25th International …, 2014 - ieeexplore.ieee.org
In this paper, we analyze a workload trace from the Google cloud cluster and characterize the observed failures. The goal of our work is to improve the understanding of failures in …
In modern cloud computing systems, hundreds and even thousands of cloud servers are interconnected by multi-layer networks. In such large-scale and complex systems, failures …
W Tang, Z Lan, N Desai… - 2009 IEEE International …, 2009 - ieeexplore.ieee.org
Job scheduling on large-scale systems is an increasingly complicated affair, with numerous factors influencing scheduling policy. Addressing these concerns results in sophisticated …
Z Zheng, L Yu, W Tang, Z Lan, R Gupta… - … parallel & distributed …, 2011 - ieeexplore.ieee.org
With the growth of system size and complexity, reliability has become of paramount importance for petascale systems. Reliability, Availability, and Serviceability (RAS) logs …
AZ Zeyuan, C Weizhu, W Gang… - 2009 Ninth IEEE …, 2009 - ieeexplore.ieee.org
It is an extreme challenge to produce a nonlinear SVM classifier on very large scale data. In this paper we describe a novel P-packSVM algorithm that can solve the Support Vector …