An ensemble hierarchical clustering algorithm based on merits at cluster and partition levels

Q Huang, R Gao, H Akhavan - Pattern Recognition, 2023 - Elsevier
Q Huang, R Gao, H Akhavan
Pattern Recognition, 2023Elsevier
Ensemble clustering has emerged as a combination of several basic clustering algorithms to
achieve high quality final clustering. However, this technique is challenging due to the
complexities in primary clusters such as overlapping, vagueness, instability and uncertainty.
Typically, ensemble clustering uses all the primary clusters into partitions for consensus,
where the merits of a cluster or a partition can be considered to improve the quality of the
consensus. In general, the robustness of a partition may be poorly measured, while having …
Abstract
Ensemble clustering has emerged as a combination of several basic clustering algorithms to achieve high quality final clustering. However, this technique is challenging due to the complexities in primary clusters such as overlapping, vagueness, instability and uncertainty. Typically, ensemble clustering uses all the primary clusters into partitions for consensus, where the merits of a cluster or a partition can be considered to improve the quality of the consensus. In general, the robustness of a partition may be poorly measured, while having some high-quality clusters. Inspired by the evaluation of cluster and partition, this paper proposes an ensemble hierarchical clustering algorithm based on the cluster consensus selection approach. Here, the selection of a subset of primary clusters from partitions based on their merit level is emphasized. Merit level is defined using the development of Normalized Mutual Information measure. Clusters of basic clustering algorithms that satisfy the predefined threshold of this measure are selected to participate in the final consensus. In addition, the consensus of the selected primary clusters to create the final clusters is performed based on the clusters clustering technique. In this technique, the selected primary clusters are re-clustered to create hyper-clusters. Finally, the final clusters are formed by assigning instances to hyper-clusters with the highest similarity. Here, an innovative criterion based on merit and cluster size for defining similarity is presented. The performance of the proposed algorithm has been proven by extensive experiments on real-world datasets from the UCI repository compared to state-of-the-art algorithms such as CPDM, ENMI, IDEA, CFTLC and SSCEN.
Elsevier
以上显示的是最相近的搜索结果。 查看全部搜索结果