MapReduce 优化技术综述

黄山, 王波涛, 王国仁, 于戈, 李佳佳 - 计算机科学与探索, 2013 - cqvip.com
作为一种处理大数据的并行编程模型, MapReduce 由于其良好的可扩展性, 可用性, 容错性,
得到了学术界和工业界的关注. 针对MapReduce 在应用领域中的不足, 已经存在大量的优化 …

Cost-based Data Prefetching and Scheduling in Big Data Platforms over Tiered Storage Systems

H Herodotou, E Kakoulli - ACM Transactions on Database Systems, 2023 - dl.acm.org
The use of storage tiering is becoming popular in data-intensive compute clusters due to the
recent advancements in storage technologies. The Hadoop Distributed File System, for …

Early straggler tasks detection by recurrent neural network in a heterogeneous environment

KL Bawankule, RK Dewang, AK Singh - Applied Intelligence, 2023 - Springer
Heterogeneity is common in parallel and distributed environments used for extensive
computations such as MapReduce. Stragglers are the tasks that are running on inferior …

Extreme Big Data (EBD): Next generation big data infrastructure technologies towards yottabyte/year

S Matsuoka, H Sato, O Tatebe, F Takatsu… - Supercomputing …, 2014 - superfri.org
Our claim is that so-called" Big Data" will evolve into a new era with proliferation of data from
multiple sources such as massive numbers of sensors whose resolution is increasing …

A classification of Hadoop job schedulers based on performance optimization approaches

R Ghazali, S Adabi, DG Down, A Movaghar - Cluster Computing, 2021 - Springer
Job scheduling in MapReduce plays a vital role in Hadoop performance. In recent years,
many researchers have presented job scheduler algorithms to improve Hadoop …

Tolhit–a scheduling algorithm for hadoop cluster

M Brahmwar, M Kumar, G Sikka - Procedia Computer Science, 2016 - Elsevier
With the accretion in use of Internet in everything, a prodigious influx of data is being
observed. Use of MapReduce as a programming model has become pervasive for …

Cross-phase optimization in mapreduce

B Heintz, A Chandra, J Weissman - Cloud computing for data-intensive …, 2014 - Springer
MapReduce has proven remarkably effective for a wide variety of data-intensive
applications, but it was designed to run on large single-site homogeneous clusters …

An adaptive multi-agent system for task reallocation in a MapReduce job

Q Baert, AC Caron, M Morge, JC Routier… - Journal of Parallel and …, 2021 - Elsevier
We study the problem of task reallocation for load-balancing of MapReduce jobs in
applications that process large datasets. In this context, we propose a novel strategy based …

How Heterogeneity Affects the Design of Hadoop MapReduce Schedulers: A State-of-the-Art Survey and Challenges

V Pandey, P Saini - Big data, 2018 - liebertpub.com
MapReduce (MR) computing paradigm and its open source implementation Hadoop have
become a de facto standard to process big data in a distributed environment. Initially, the …

H-scheduler: Storage-aware task scheduling for heterogeneous-storage spark clusters

F Pan, J Xiong, Y Shen, T Wang… - 2018 IEEE 24th …, 2018 - ieeexplore.ieee.org
A trend in nowadays data centers is that heterogeneous storage devices are deployed to
meet different storage demands of various big data workloads. For example, many nodes …