Machine learning methods for reliable resource provisioning in edge-cloud computing: A survey

TL Duc, RG Leiva, P Casari, PO Östberg - ACM Computing Surveys …, 2019 - dl.acm.org
Large-scale software systems are currently designed as distributed entities and deployed in
cloud data centers. To overcome the limitations inherent to this type of deployment …

BHyPreC: a novel Bi-LSTM based hybrid recurrent neural network model to predict the CPU workload of cloud virtual machine

ME Karim, MMS Maswood, S Das, AG Alharbi - IEEE Access, 2021 - ieeexplore.ieee.org
With the advancement of cloud computing technologies, there is an ever-increasing demand
for the maximum utilization of cloud resources. It increases the computing power …

Characterizing temperature, power, and soft-error behaviors in data center systems: Insights, challenges, and opportunities

B Nie, J Xue, S Gupta, C Engelmann… - 2017 IEEE 25th …, 2017 - ieeexplore.ieee.org
GPUs have become part of the mainstream high performance computing facilities that
increasingly require more computational power to simulate physical phenomena quickly and …

Smartharvest: Harvesting idle cpus safely and efficiently in the cloud

Y Wang, K Arya, M Kogias, M Vanga… - Proceedings of the …, 2021 - dl.acm.org
We can increase the efficiency of public cloud datacenters by harvesting allocated but
temporarily idling CPU cores from customer virtual machines (VMs) to run batch or analytics …

SLA-aware and energy-efficient VM consolidation in cloud data centers using robust linear regression prediction model

L Li, J Dong, D Zuo, J Wu - IEEE Access, 2019 - ieeexplore.ieee.org
Virtual machine (VM) consolidation provides a promising approach to save energy and to
improve resource utilization in data centers. However, the aggressive consolidation of virtual …

A large-scale study of soft-errors on GPUs in the field

B Nie, D Tiwari, S Gupta, E Smirni… - 2016 IEEE International …, 2016 - ieeexplore.ieee.org
Parallelism provided by the GPU architecture has enabled domain scientists to simulate
physical phenomena at a much faster rate and finer granularity than what was previously …

Machine learning models for GPU error prediction in a large scale HPC system

B Nie, J Xue, S Gupta, T Patel… - 2018 48th Annual …, 2018 - ieeexplore.ieee.org
GPUs are widely deployed on large-scale HPC systems to provide powerful computational
capability for scientific applications from various domains. As those applications are …

Characterizing co-located workloads in alibaba cloud datacenters

C Jiang, Y Qiu, W Shi, Z Ge, J Wang… - … on Cloud Computing, 2020 - ieeexplore.ieee.org
Workload characteristics are vital for both data center operation and job scheduling in co-
located data centers, where online services and batch jobs are deployed on the same …

Performance modeling and predictive scheduling for distributed stream data processing

T Li, J Tang, J Xu - IEEE Transactions on Big Data, 2016 - ieeexplore.ieee.org
In a distributed stream data processing system, an application is usually modeled using a
directed graph, in which each vertex corresponds to a data source or a processing unit, and …

A joint model for IT operation series prediction and anomaly detection

RQ Chen, GH Shi, WL Zhao, CH Liang - Neurocomputing, 2021 - Elsevier
Status prediction and anomaly detection are two fundamental tasks in automatic IT systems
monitoring. In this paper, a joint model Predictor & Anomaly Detector (PAD) is proposed to …