Leveraging adaptive I/O to optimize collective data shuffling patterns for big data analytics

A comprehensive survey on coded distributed computing: Fundamentals, challenges, and networking applications

JS Ng, WYB Lim, NC Luong, Z Xiong… - … Surveys & Tutorials, 2021 - ieeexplore.ieee.org

Distributed computing has become a common approach for large-scale computation tasks
due to benefits such as high reliability, scalability, computation speed, and cost …

被引用次数：88 相关文章

[PDF] arxiv.org

A survey on spark ecosystem: Big data processing infrastructure, machine learning, and applications

S Tang, B He, C Yu, Y Li, K Li - IEEE Transactions on …, 2020 - ieeexplore.ieee.org

With the explosive increase of big data in industry and academic fields, it is important to
apply large-scale data processing systems to analyze Big Data. Arguably, Spark is the state …

被引用次数：96 相关文章所有 6 个版本

Intermediate data placement and cache replacement strategy under Spark platform

C Li, Y Zhang, Y Luo - Journal of Parallel and Distributed Computing, 2022 - Elsevier

Spark is widely used due to its high performance caching mechanism and high scalability,
which still causes uneven workloads and produces useless intermediate caching results …

被引用次数：27 相关文章所有 2 个版本

A novel hybrid approach for multi-dimensional data anonymization for apache spark

SU Bazai, J Jang-Jaccard, H Alavizadeh - ACM Transactions on Privacy …, 2021 - dl.acm.org

Multi-dimensional data anonymization approaches (eg, Mondrian) ensure more fine-grained
data privacy by providing a different anonymization strategy applied for each attribute. Many …

被引用次数：21 相关文章

[PDF] arxiv.org

A survey of coded distributed computing

JS Ng, WYB Lim, NC Luong, Z Xiong… - arXiv preprint arXiv …, 2020 - arxiv.org

Distributed computing has become a common approach for large-scale computation of tasks
due to benefits such as high reliability, scalability, computation speed, and costeffectiveness …

被引用次数：27 相关文章所有 2 个版本

[PDF] ieee.org

Performance model of mapreduce iterative applications for hybrid cloud bursting

FJ Clemente-Castelló, B Nicolae… - … on Parallel and …, 2018 - ieeexplore.ieee.org

Hybrid cloud bursting (ie, leasing temporary off-premise cloud resources to boost the overall
capacity during peak utilization) can be a cost-effective way to deal with the increasing …

被引用次数：29 相关文章所有 9 个版本

[PDF] ieee.org

Toward high-performance computing and big data analytics convergence: The case of spark-diy

S Caino-Lores, J Carretero, B Nicolae, O Yildiz… - IEEE …, 2019 - ieeexplore.ieee.org

Convergence between high-performance computing (HPC) and big data analytics (BDA) is
currently an established research area that has spawned new opportunities for unifying the …

被引用次数：16 相关文章所有 7 个版本

[PDF] anl.gov

Spark-diy: A framework for interoperable spark operations with high performance block-based data models

S Caíno-Lores, J Carretero, B Nicolae… - 2018 IEEE/ACM 5th …, 2018 - ieeexplore.ieee.org

Today's scientific applications are increasingly relying on a variety of data sources, storage
facilities, and computing infrastructures, and there is a growing demand for data analysis …

被引用次数：19 相关文章所有 4 个版本

[PDF] academia.edu

A performance study of big data workloads in cloud datacenters with network variability

A Uta, H Obaseki - Companion of the 2018 ACM/SPEC International …, 2018 - dl.acm.org

Public cloud computing platforms are a cost-effective solution for individuals and
organizations to deploy various types of workloads, ranging from scientific applications …

被引用次数：16 相关文章所有 8 个版本

Improving the robustness and performance of parallel joins over distributed systems

L Cheng, S Kotoulas, TE Ward… - Journal of Parallel and …, 2017 - Elsevier

High-performance data processing systems typically utilize numerous servers with large
amounts of memory. An essential operation in such environment is the parallel join, the …

被引用次数：16 相关文章所有 4 个版本

高级搜索

QQ 群