[PDF][PDF] Big data clustering techniques based on spark: a literature review

MM Saeed, Z Al Aghbari, M Alsharidah - PeerJ Computer Science, 2020 - peerj.com
A popular unsupervised learning method, known as clustering, is extensively used in data
mining, machine learning and pattern recognition. The procedure involves grouping of …

Scalable machine‐learning algorithms for big data analytics: a comprehensive review

P Gupta, A Sharma, R Jindal - Wiley Interdisciplinary Reviews …, 2016 - Wiley Online Library
Big data analytics is one of the emerging technologies as it promises to provide better
insights from huge and heterogeneous data. Big data analytics involves selecting the …

Affinity clustering: Hierarchical clustering at scale

MH Bateni, S Behnezhad… - Advances in …, 2017 - proceedings.neurips.cc
Graph clustering is a fundamental task in many data-mining and machine-learning
pipelines. In particular, identifying a good hierarchical structure is at the same time a …

[HTML][HTML] A three-way cluster ensemble approach for large-scale data

H Yu, Y Chen, P Lingras, G Wang - International Journal of Approximate …, 2019 - Elsevier
Cluster ensemble has emerged as a powerful technique for combining multiple clustering
results. To address the problem of clustering on large-scale data, this paper presents an …

Scalable clustering by aggregating representatives in hierarchical groups

WB Xie, Z Liu, D Das, B Chen, J Srivastava - Pattern Recognition, 2023 - Elsevier
Appropriately handling the scalability of clustering is a long-standing challenge for the study
of clustering techniques and is of fundamental interest to researchers in the community of …

Scalable hierarchical agglomerative clustering

N Monath, KA Dubey, G Guruganesh… - Proceedings of the 27th …, 2021 - dl.acm.org
The applicability of agglomerative clustering, for inferring both hierarchical and flat
clustering, is limited by its scalability. Existing scalable hierarchical clustering methods …

Massively parallel algorithms and hardness for single-linkage clustering under ℓp-distances

G Yaroslavtsev, A Vadapalli - … Conference on Machine Learning (ICML'18 …, 2018 - par.nsf.gov
We present first massively parallel (MPC) algorithms and hardness of approximation results
for computing Single-Linkage Clustering of $ n $ input $ d $-dimensional vectors under …

[HTML][HTML] AICCA: AI-driven cloud classification atlas

T Kurihana, EJ Moyer, IT Foster - Remote Sensing, 2022 - mdpi.com
Clouds play an important role in the Earth's energy budget, and their behavior is one of the
largest uncertainties in future climate projections. Satellite observations should help in …

Terahac: Hierarchical agglomerative clustering of trillion-edge graphs

L Dhulipala, J Łącki, J Lee, V Mirrokni - … of the ACM on Management of …, 2023 - dl.acm.org
We introduce TeraHAC, a (1+ ε)-approximate hierarchical agglomerative clustering (HAC)
algorithm which scales to trillion-edge graphs. Our algorithm is based on a new approach to …

A survey of parallel clustering algorithms based on spark

W Xiao, J Hu - Scientific Programming, 2020 - Wiley Online Library
Clustering is one of the most important unsupervised machine learning tasks, which is
widely used in information retrieval, social network analysis, image processing, and other …