Intermediate data placement and cache replacement strategy under Spark platform

C Li, Y Zhang, Y Luo - Journal of Parallel and Distributed Computing, 2022 - Elsevier
Spark is widely used due to its high performance caching mechanism and high scalability,
which still causes uneven workloads and produces useless intermediate caching results …

An overview on cloud computing platform spark for Human Genome mining

D Ding, D Wu, F Yu - 2016 IEEE International Conference on …, 2016 - ieeexplore.ieee.org
The development of the Human Genome Project provides the important technical guarantee
for people's health. The gene sequencing service plays an important role in the disease …

Profile-based power-aware workflow scheduling framework for energy-efficient data centers

B Qureshi - Future Generation Computer Systems, 2019 - Elsevier
In the age of big data, software-as-a-service (SaaS) clouds provide heterogeneous and
multitenant utilization of underlying virtual environments in data centers. Real-time and …

Intermediate data caching optimization for multi-stage and parallel big data frameworks

Z Yang, D Jia, S Ioannidis, N Mi… - 2018 IEEE 11th …, 2018 - ieeexplore.ieee.org
In the era of big data and cloud computing, large amounts of data are generated from user
applications and need to be processed in the datacenter. Data-parallel computing …

Memory management approaches in apache spark: A review

M Dessokey, SM Saif, S Salem, E Saad… - … Conference on Advanced …, 2020 - Springer
In the era of Big Data, processing large amounts of data through data-intensive applications,
is presenting a challenge. An in-memory distributed computing system; Apache Spark is …

Virtual service placement for edge computing under finite memory and bandwidth

S He, X Lyu, W Ni, H Tian, RP Liu… - IEEE Transactions on …, 2020 - ieeexplore.ieee.org
Edge computing allows an edge server to adaptively place virtual instances to serve
different types of data. This article presents a new algorithm which jointly optimizes virtual …

A fast large-scale path planning method on lunar DEM using distributed tile pyramid strategy

Z Hong, B Tu, X Tong, H Pan, R Zhou… - IEEE Journal of …, 2022 - ieeexplore.ieee.org
In lunar exploration missions, path planning for lunar rovers using digital elevation models
(DEMs) is currently a hot topic in academic research. However, research on path planning …

Understanding and improving disk-based intermediate data caching in Spark

K Zhang, Y Tanimura, H Nakada… - 2017 IEEE International …, 2017 - ieeexplore.ieee.org
Apache Spark is a parallel data processing framework that executes fast for iterative
calculations and interactive processing, by caching intermediate data in memory with a …

Dynamic data replacement and adaptive scheduling policies in spark

C Li, Q Cai, Y Luo - Cluster Computing, 2022 - Springer
Improper data replacement and inappropriate selection of job scheduling policy are
important reasons for the degradation of Spark system operation speed, which directly …

LCRC: a dependency-aware cache management policy for spark

B Wang, J Tang, R Zhang, W Ding… - 2018 IEEE Intl Conf on …, 2018 - ieeexplore.ieee.org
Memory is a constrained resource for in-memory big data computing systems. Efficient
memory management plays a pivotal role in performance improvement for these systems …