相关文章- 学术资源搜索

Optimization of the join between large tables in the spark distributed framework

X Wu, Y He - Applied Sciences, 2023 - mdpi.com

The Join task between Spark large tables takes a long time to run and produces a lot of disk
I/O, network I/O and disk occupation in the Shuffle process. This paper proposes a …

被引用次数：1 相关文章所有 3 个版本

A theoretical and experimental comparison of large-scale join algorithms in spark

AC Phan, TC Phan, TN Trieu, TTQ Tran - SN Computer Science, 2021 - Springer

Currently, the estimated amount of data created daily have reached the threshold of
petabytes or even zettabytes globally. It is no wonder that traditional data processing …

被引用次数：3 相关文章所有 2 个版本

A comparative study of join algorithms in spark

AC Phan, TC Phan, TN Trieu - Future Data and Security Engineering: 7th …, 2020 - Springer

In the era of information explosion, the amount of data generated is increasing day by day,
reached the threshold of petabytes or even zettabytes. In order to extract useful information …

被引用次数：5 相关文章所有 2 个版本

Join algorithms under apache spark: revisited

A Al-Badarneh - Proceedings of the 2019 5th International Conference …, 2019 - dl.acm.org

Currently, we are dealing with large scale applications, which in turn generate massive
amount of data and information. Large amount of data often requires processing algorithms …

被引用次数：3 相关文章

A Spark Join Algorithm Based on Memory Monitoring and Batch Processing

C Kefei, L Zhao, Z Ke, D Xianjun… - 2018 IEEE 9th …, 2018 - ieeexplore.ieee.org

In recent years, the Spark memory computing framework has risen rapidly, and the data
processing speed has been greatly improved. However, the upper limit of speed is limited by …

被引用次数：1 相关文章

Optimization of data distribution strategy in theta-join process based on spark

S Cao, E Haihong, M Song, K Zhang - Proceedings of the 2nd …, 2018 - dl.acm.org

The theta-join between tables is a common operation in the data query and statistical
analysis. When dealing with large amounts of data, it will produce a great deal of cost. The …

被引用次数：2 相关文章

[PDF] arxiv.org

Approximate distributed joins in apache spark

DL Quoc, IE Akkus, P Bhatotia, S Blanas… - arXiv preprint arXiv …, 2018 - arxiv.org

The join operation is a fundamental building block of parallel data processing. Unfortunately,
it is very resource-intensive to compute an equi-join across massive datasets. The …

被引用次数：7 相关文章所有 2 个版本

Utilizing page-level join index for optimization in parallel join execution

C Lee, ZA Chang - IEEE transactions on knowledge and data …, 1995 - ieeexplore.ieee.org

This paper presents a methodology for the optimization of parallel join execution. Past
research on parallel join methods mostly focused on the design of algorithms for partitioning …

被引用次数：17 相关文章所有 8 个版本

PI-Join: Efficiently processing join queries on massive data

X Han, J Li, D Yang - Knowledge and information systems, 2012 - Springer

The ratio of disk capacity to disk transfer rate typically increases by 10× per decade. As a
result, disk is becoming slower from the view of applications because of the much larger …

被引用次数：9 相关文章所有 10 个版本

[PDF] ieee.org

Distributed join processing between streaming and stored big data under the micro-batch model

YH Jeon, KH Lee, HJ Kim - IEEE Access, 2019 - ieeexplore.ieee.org

In order to interpret, enrich, and analyze the streaming data, stream applications often
access the data stored in an external database. Although there has been a lot of studies on …

被引用次数：20 相关文章所有 3 个版本

高级搜索

QQ 群