作者
Ali Selamat, Sigeru Omatu
发表日期
2004/1/1
期刊
Information sciences
卷号
158
页码范围
69-88
出版商
Elsevier
简介
Automatic categorization is the only viable method to deal with the scaling problem of the World Wide Web (WWW). In this paper, we propose a news web page classification method (WPCM). The WPCM uses a neural network with inputs obtained by both the principal components and class profile-based features. Each news web page is represented by the term-weighting scheme. As the number of unique words in the collection set is big, the principal component analysis (PCA) has been used to select the most relevant features for the classification. Then the final output of the PCA is combined with the feature vectors from the class-profile which contains the most regular words in each class. We have manually selected the most regular words that exist in each class and weighted them using an entropy weighting scheme. The fixed number of regular words from each class will be used as a feature vectors together …
引用总数
200420052006200720082009201020112012201320142015201620172018201920202021202220232024451191411121513815141279528561