S Deng, Z He, X Xu - Knowledge-Based Systems, 2010 - Elsevier
Identification of meaningful clusters from categorical data is one key problem in data mining. Recently, Average Normalized Mutual Information (ANMI) has been used to define …
N Iam-On, T Boongeon, S Garrett… - IEEE Transactions on …, 2010 - ieeexplore.ieee.org
Although attempts have been made to solve the problem of clustering categorical data via cluster ensembles, with the results being competitive to conventional algorithms, it is …
H Qin, X Ma, T Herawan, JM Zain - Knowledge-Based Systems, 2014 - Elsevier
Categorical data clustering has attracted much attention recently due to the fact that much of the data contained in today's databases is categorical in nature. While many algorithms for …
Despite recent efforts, the challenge in clustering categorical and mixed data in the context of big data still remains due to the lack of inherently meaningful measure of similarity …
Z He, X Xu, S Deng, B Dong - arXiv preprint cs/0509033, 2005 - arxiv.org
Clustering categorical data is an integral part of data mining and has attracted much attention recently. In this paper, we present k-histogram, a new efficient algorithm for …
Z He, X Xu, S Deng - Information Fusion, 2005 - Elsevier
Categorical data clustering (CDC) and cluster ensemble (CE) have long been considered as separate research and application areas. The main focus of this paper is to investigate the …
Z He, X Xu, S Deng - arXiv preprint cs/0509011, 2005 - arxiv.org
Clustering is a widely used technique in data mining applications for discovering patterns in underlying data. Most traditional clustering algorithms are limited to handling datasets that …
The performance of a partitional clustering algorithm is influenced by the initial random choice of cluster centers. Different runs of the clustering algorithm on the same data set often …
T Xiong, S Wang, A Mayers, E Monga - Data Mining and Knowledge …, 2012 - Springer
Clustering categorical data poses two challenges defining an inherently meaningful similarity measure, and effectively dealing with clusters which are often embedded in …