Semi-supervised text classification from unlabeled documents using class associated words

H Han, DH Zhu, X Wang - 2009 International Conference on …, 2009 - ieeexplore.ieee.org
H Han, DH Zhu, X Wang
2009 International Conference on Computers & Industrial Engineering, 2009ieeexplore.ieee.org
Automatically classifying text documents is an important field in machine learning.
Unsupervised text classification does not need training data but is often criticized to cluster
blindly. Supervised text classification needs large quantities of labeled training data to
achieve high accuracy. However, in practice, labeled samples are often difficult, expensive
or time consuming to obtain. In the meanwhile, unlabeled documents can be collected easily
owing to the rapid developing Internet. Class associated words are the words which …
Automatically classifying text documents is an important field in machine learning. Unsupervised text classification does not need training data but is often criticized to cluster blindly. Supervised text classification needs large quantities of labeled training data to achieve high accuracy. However, in practice, labeled samples are often difficult, expensive or time consuming to obtain. In the meanwhile, unlabeled documents can be collected easily owing to the rapid developing Internet. Class associated words are the words which represent the subject of classes and provide prior knowledge of classification for training a classifier. A learning algorithm, based on the combination of Expectation-Maximization (EM) and a Naive Bayes classifier, is introduced to classify documents from fully unlabeled documents using class associated words. Experimental results show that it has good classification capability with high accuracy, especially for those categories with small quantities of samples. In the algorithm, class associated words are used to set classification constraints during learning process to restrict to classify documents into corresponding class labels and improve the classification accuracy.
ieeexplore.ieee.org
以上显示的是最相近的搜索结果。 查看全部搜索结果