作者
Jean-Patrick Baudry, Adrian E Raftery, Gilles Celeux, Kenneth Lo, Raphael Gottardo
发表日期
2010/1/1
期刊
Journal of computational and graphical statistics
卷号
19
期号
2
页码范围
332-353
出版商
Taylor & Francis
简介
Model-based clustering consists of fitting a mixture model to data and identifying each cluster with one of its components. Multivariate normal distributions are typically used. The number of clusters is usually determined from the data, often using BIC. In practice, however, individual clusters can be poorly fitted by Gaussian distributions, and in that case model-based clustering tends to represent one non-Gaussian cluster by a mixture of two or more Gaussian distributions. If the number of mixture components is interpreted as the number of clusters, this can lead to overestimation of the number of clusters. This is because BIC selects the number of mixture components needed to provide a good approximation to the density, rather than the number of clusters as such. We propose first selecting the total number of Gaussian mixture components, K, using BIC and then combining them hierarchically according to an …
引用总数
20092010201120122013201420152016201720182019202020212022202320244101418243830312031242933312414
学术搜索中的文章
JP Baudry, AE Raftery, G Celeux, K Lo, R Gottardo - Journal of computational and graphical statistics, 2010