S Tang, B He, C Yu, Y Li, K Li - IEEE Transactions on …, 2020 - ieeexplore.ieee.org
With the explosive increase of big data in industry and academic fields, it is important to apply large-scale data processing systems to analyze Big Data. Arguably, Spark is the state …
C Li, Y Zhang, Y Luo - Journal of Parallel and Distributed Computing, 2022 - Elsevier
Spark is widely used due to its high performance caching mechanism and high scalability, which still causes uneven workloads and produces useless intermediate caching results …
Multi-dimensional data anonymization approaches (eg, Mondrian) ensure more fine-grained data privacy by providing a different anonymization strategy applied for each attribute. Many …
Distributed computing has become a common approach for large-scale computation of tasks due to benefits such as high reliability, scalability, computation speed, and costeffectiveness …
FJ Clemente-Castelló, B Nicolae… - … on Parallel and …, 2018 - ieeexplore.ieee.org
Hybrid cloud bursting (ie, leasing temporary off-premise cloud resources to boost the overall capacity during peak utilization) can be a cost-effective way to deal with the increasing …
Convergence between high-performance computing (HPC) and big data analytics (BDA) is currently an established research area that has spawned new opportunities for unifying the …
Today's scientific applications are increasingly relying on a variety of data sources, storage facilities, and computing infrastructures, and there is a growing demand for data analysis …
A Uta, H Obaseki - Companion of the 2018 ACM/SPEC International …, 2018 - dl.acm.org
Public cloud computing platforms are a cost-effective solution for individuals and organizations to deploy various types of workloads, ranging from scientific applications …
High-performance data processing systems typically utilize numerous servers with large amounts of memory. An essential operation in such environment is the parallel join, the …