C Li, Y Zhang, Y Luo - Journal of Parallel and Distributed Computing, 2022 - Elsevier
Spark is widely used due to its high performance caching mechanism and high scalability, which still causes uneven workloads and produces useless intermediate caching results …
H Li, S Ji, H Zhong, W Wang, L Xu, Z Tang… - Science China …, 2023 - Springer
Caching is one of the most important techniques for the popular distributed big data processing framework Spark. For this big data parallel computing framework, which is …
S Wang, S Geng, Z Zhang, A Ye… - Computers …, 2019 - search.ebscohost.com
Spark is a distributed data processing framework based on memory. Memory allocation is a focus question of Spark research. A good memory allocation scheme can effectively improve …
J Zhang, R Zhang, O Alfarraj, A Tolba… - Journal of Internet …, 2022 - jit.ndhu.edu.tw
Spark is currently the most widely used distributed computing framework, and its key data abstraction concept, Resilient Distributed Dataset (RDD), brings significant performance …
C Li, Q Cai, Y Luo - Cluster Computing, 2022 - Springer
Improper data replacement and inappropriate selection of job scheduling policy are important reasons for the degradation of Spark system operation speed, which directly …
H Inagaki, T Fujii, R Kawashima… - 2018 6th International …, 2018 - ieeexplore.ieee.org
Apache Spark caches reusable data into memory/disk. From our preliminary evaluation, we have found that a memory-and-disk caching is ineffective compared to disk-only caching …
C Li, Q Cai, Y Luo - The Journal of Supercomputing, 2022 - Springer
Both data shuffling and cache recovery are essential parts of the Spark system, and they directly affect Spark parallel computing performance. Existing dynamic partitioning schemes …
In the era of Big Data, processing large amounts of data through data-intensive applications, is presenting a challenge. An in-memory distributed computing system; Apache Spark is …
W Guo, C Huang, W Tian - Concurrency and Computation …, 2020 - Wiley Online Library
As a typical representative of distributed computing framework, Spark has been continuously developed and popularized. It reduces the data transmission time through efficient memory …