[PDF][PDF] Feature subset selection and order identification for unsupervised learning

JG Dy, CE Brodley - Icml, 2000 - academia.edu
Icml, 2000academia.edu
This paper explores the problem of feature subset selection for unsupervised learning within
the wrapper framework. In particular, we examine feature subset selection wrapped around
expectation-maximization EM clustering with order identi cation identifying the number of
clusters in the data. We investigate two di erent performance criteria for evaluating candidate
feature subsets: scatter separability and maximum likelihood. When the true" number of
clusters k is unknown, our experiments on simulated Gaussian data and real data sets show …
Abstract
This paper explores the problem of feature subset selection for unsupervised learning within the wrapper framework. In particular, we examine feature subset selection wrapped around expectation-maximization EM clustering with order identi cation identifying the number of clusters in the data. We investigate two di erent performance criteria for evaluating candidate feature subsets: scatter separability and maximum likelihood. When the true" number of clusters k is unknown, our experiments on simulated Gaussian data and real data sets show that incorporating the search for k within the feature selection procedure obtains better class" accuracy than xing k to be the number of classes. There are two reasons: 1 the true" number of Gaussian components is not necessarily equal to the number of classes and 2 clustering with di erent feature subsets can result in di erent numbers of true" clusters. Our empirical evaluation shows that feature selection reduces the number of features and improves clustering performance with respect to the chosen performance criteria.
academia.edu
以上显示的是最相近的搜索结果。 查看全部搜索结果