作者
Ammar Ismael Kadhim, Yu-N Cheah, Nurul Hashimah Ahamed, Lubab A Salman
发表日期
2014/12/16
研讨会论文
2014 IEEE student conference on research and development
页码范围
1-4
出版商
IEEE
简介
A major challenge in topic classification (TC) is the high dimensionality of the feature space. Therefore, feature extraction (FE) plays a vital role in topic classification in particular and text mining in general. FE based on cosine similarity score is commonly used to reduce the dimensionality of datasets with tens or hundreds of thousands of features, which can be impossible to process further. In this study, TF-IDF term weighting is used to extract features. Selecting relevant features and determining how to encode them for a learning machine method have a vast impact on the learning machine methods ability to extract a good model. Two different weighting methods (TF-IDF and TF-IDF Global) were used and tested on the Reuters-21578 text categorization test collection. The obtained results emerged a good candidate for enhancing the performance of English topics FE. Simulation results the Reuters-21578 text …
引用总数
2016201720182019202020212022202320241222141
学术搜索中的文章
AI Kadhim, YN Cheah, NH Ahamed, LA Salman - 2014 IEEE student conference on research and …, 2014