MapReduce scheduling algorithms: a review

IAT Hashem, NB Anuar, M Marjani, E Ahmed… - The Journal of …, 2020 - Springer
Recent trends in big data have shown that the amount of data continues to increase at an
exponential rate. This trend has inspired many researchers over the past few years to …

T-storm: Traffic-aware online scheduling in storm

J Xu, Z Chen, J Tang, S Su - 2014 IEEE 34th International …, 2014 - ieeexplore.ieee.org
Storm has emerged as a promising computation platform for stream data processing. In this
paper, we first show inefficiencies of the current practice of Storm scheduling and challenges …

Classification framework of MapReduce scheduling algorithms

N Tiwari, S Sarkar, U Bellur, M Indrawan - ACM Computing Surveys …, 2015 - dl.acm.org
A MapReduce scheduling algorithm plays a critical role in managing large clusters of
hardware nodes and meeting multiple quality requirements by controlling the order and …

Maptask scheduling in mapreduce with data locality: Throughput and heavy-traffic optimality

W Wang, K Zhu, L Ying, J Tan… - IEEE/ACM Transactions …, 2014 - ieeexplore.ieee.org
MapReduce/Hadoop framework has been widely used to process large-scale datasets on
computing clusters. Scheduling map tasks with data locality consideration is crucial to the …

Model-free control for distributed stream data processing using deep reinforcement learning

T Li, Z Xu, J Tang, Y Wang - arXiv preprint arXiv:1803.01016, 2018 - arxiv.org
In this paper, we focus on general-purpose Distributed Stream Data Processing Systems
(DSDPSs), which deal with processing of unbounded streams of continuous data at scale …

Energy-aware scheduling of mapreduce jobs for big data applications

L Mashayekhy, MM Nejad, D Grosu… - IEEE transactions on …, 2014 - ieeexplore.ieee.org
The majority of large-scale data intensive applications executed by data centers are based
on MapReduce or its open-source implementation, Hadoop. Such applications are executed …

Encoded bitmap indexing for data warehouses

MC Wu, AP Buchmann - Proceedings 14th International …, 1998 - ieeexplore.ieee.org
Complex query types, huge data volumes, and very high read/update ratios make the
indexing techniques designed and tuned for traditional database systems unsuitable for …

Cost minimization for big data processing in geo-distributed data centers

L Gu, D Zeng, P Li, S Guo - IEEE transactions on Emerging …, 2014 - ieeexplore.ieee.org
The explosive growth of demands on big data processing imposes a heavy burden on
computation, storage, and communication in data centers, which hence incurs considerable …

Resource aware scheduling in a distributed computing environment

X Meng, J Tan, L Zhang - US Patent 9,201,690, 2015 - Google Patents
Systems and methods for resource aware scheduling of pro cesses in a distributed
computing environment are described herein. One aspect provides for accessing at least …

Deep reinforcement learning enhanced greedy optimization for online scheduling of batched tasks in cloud HPC systems

Y Yang, H Shen - IEEE Transactions on Parallel and …, 2021 - ieeexplore.ieee.org
In a large cloud data center HPC system, a critical problem is how to allocate the submitted
tasks to heterogeneous servers that will achieve the goal of maximizing the system's gain …