相关文章- 学术资源搜索

Effective data management strategy and RDD weight cache replacement strategy in Spark

K Jiang, S Du, F Zhao, Y Huang, C Li, Y Luo - Computer Communications, 2022 - Elsevier

With the dramatic increase in internet users and their demand for real-time network
performance, Spark has distributed computing environment has emerged. It is widely used …

被引用次数：6 相关文章

Intermediate data placement and cache replacement strategy under Spark platform

C Li, Y Zhang, Y Luo - Journal of Parallel and Distributed Computing, 2022 - Elsevier

Spark is widely used due to its high performance caching mechanism and high scalability,
which still causes uneven workloads and produces useless intermediate caching results …

被引用次数：24 相关文章所有 2 个版本

[PDF] github.io

LPW: an efficient data-aware cache replacement strategy for Apache Spark

H Li, S Ji, H Zhong, W Wang, L Xu, Z Tang… - Science China …, 2023 - Springer

Caching is one of the most important techniques for the popular distributed big data
processing framework Spark. For this big data parallel computing framework, which is …

被引用次数：2 相关文章所有 6 个版本

A Dynamic Memory Allocation Optimization Mechanism Based on Spark.

S Wang, S Geng, Z Zhang, A Ye… - Computers …, 2019 - search.ebscohost.com

Spark is a distributed data processing framework based on memory. Memory allocation is a
focus question of Spark research. A good memory allocation scheme can effectively improve …

被引用次数：6 相关文章

[PDF] ndhu.edu.tw

A memory-aware spark cache replacement strategy

J Zhang, R Zhang, O Alfarraj, A Tolba… - Journal of Internet …, 2022 - jit.ndhu.edu.tw

Spark is currently the most widely used distributed computing framework, and its key data
abstraction concept, Resilient Distributed Dataset (RDD), brings significant performance …

被引用次数：1 相关文章所有 2 个版本

Dynamic data replacement and adaptive scheduling policies in spark

C Li, Q Cai, Y Luo - Cluster Computing, 2022 - Springer

Improper data replacement and inappropriate selection of job scheduling policy are
important reasons for the degradation of Spark system operation speed, which directly …

被引用次数：5 相关文章所有 3 个版本

[PDF] researchgate.net

Adaptive Control of Apache Spark's Data Caching Mechanism Based on Workload Characteristics

H Inagaki, T Fujii, R Kawashima… - 2018 6th International …, 2018 - ieeexplore.ieee.org

Apache Spark caches reusable data into memory/disk. From our preliminary evaluation, we
have found that a memory-and-disk caching is ineffective compared to disk-only caching …

被引用次数：5 相关文章所有 4 个版本

Data balancing-based intermediate data partitioning and check point-based cache recovery in Spark environment

C Li, Q Cai, Y Luo - The Journal of Supercomputing, 2022 - Springer

Both data shuffling and cache recovery are essential parts of the Spark system, and they
directly affect Spark parallel computing performance. Existing dynamic partitioning schemes …

被引用次数：5 相关文章所有 3 个版本

[PDF] academia.edu

Memory management approaches in apache spark: A review

M Dessokey, SM Saif, S Salem, E Saad… - … Conference on Advanced …, 2020 - Springer

In the era of Big Data, processing large amounts of data through data-intensive applications,
is presenting a challenge. An in-memory distributed computing system; Apache Spark is …

被引用次数：9 相关文章所有 2 个版本

Handling data skew at reduce stage in Spark by ReducePartition

W Guo, C Huang, W Tian - Concurrency and Computation …, 2020 - Wiley Online Library

As a typical representative of distributed computing framework, Spark has been continuously
developed and popularized. It reduces the data transmission time through efficient memory …

被引用次数：5 相关文章所有 2 个版本

高级搜索

QQ 群