Intermediate data placement and cache replacement strategy under Spark platform

C Li, Y Zhang, Y Luo - Journal of Parallel and Distributed Computing, 2022 - Elsevier
Spark is widely used due to its high performance caching mechanism and high scalability,
which still causes uneven workloads and produces useless intermediate caching results …

Memory management approaches in apache spark: A review

M Dessokey, SM Saif, S Salem, E Saad… - … Conference on Advanced …, 2020 - Springer
In the era of Big Data, processing large amounts of data through data-intensive applications,
is presenting a challenge. An in-memory distributed computing system; Apache Spark is …

BDEv 3.0: energy efficiency and microarchitectural characterization of Big Data processing frameworks

J Veiga, J Enes, RR Expósito, J Tourino - Future Generation Computer …, 2018 - Elsevier
As the size of Big Data workloads keeps increasing, the evaluation of distributed frameworks
becomes a crucial task in order to identify potential performance bottlenecks that may delay …

LPW: an efficient data-aware cache replacement strategy for Apache Spark

H Li, S Ji, H Zhong, W Wang, L Xu, Z Tang… - Science China …, 2023 - Springer
Caching is one of the most important techniques for the popular distributed big data
processing framework Spark. For this big data parallel computing framework, which is …

Caching in the Multiverse

M Abdi, A Mosayyebzadeh, MH Hajkazemi… - 11th USENIX Workshop …, 2019 - usenix.org
To get good performance for data stored in Object storage services like S3, data analysis
clusters need to cache data locally. Recently these caches have started taking into account …

[PDF][PDF] Blaze: Holistic Caching for Iterative Data Processing.

WW Song, J Eo, T Um, M Jeon, BG Chun - EuroSys, 2024 - wonook.github.io
Modern data processing workloads, such as machine learning and graph processing,
involve iterative computations to converge generated models into higher accuracy. An …

SAC: Dynamic Caching upon Sketch for In-Memory Big Data Analytics

M Ji, M Zhou, H Zou, M Tang, Z Qian… - 2023 9th International …, 2023 - ieeexplore.ieee.org
Caching intermediate results in memory, instead of flushing them to disks, actually shortens
the completion of big data analytics, because there is no need to reload them for follow-up …

Towards dependency-aware cache management for data analytics applications

Y Yu, C Zhang, W Wang, J Zhang… - IEEE Transactions on …, 2019 - ieeexplore.ieee.org
Memory caches are being used aggressively in today's data analytics systems such as
Spark, Tez, and Piccolo. The significant performance impact of caches and their limited sizes …

Adaptive Control of Apache Spark's Data Caching Mechanism Based on Workload Characteristics

H Inagaki, T Fujii, R Kawashima… - 2018 6th International …, 2018 - ieeexplore.ieee.org
Apache Spark caches reusable data into memory/disk. From our preliminary evaluation, we
have found that a memory-and-disk caching is ineffective compared to disk-only caching …

A Code Caching Method for Industrial Software Services

H Deng, C Lit, Z Qi, P Liu, W Qian… - 2024 IEEE International …, 2024 - ieeexplore.ieee.org
In the context of the Industrial Internet of Things (IIoT), industrial software services have
significantly revolution-ized conventional industrial manufacturing processes by inte-grating …