查看文章

researchgate.net 中的 [PDF]

An Evaluation of Preprocessing Techniques for Text Classification

作者

Ammar Ismael Kadhim

发表日期

2018/6

期刊

International Journal of Computer Science and Information Security (IJCSIS)

卷号

期号

页码范围

22-32

出版商

简介

Text preprocessing is a vital stage in text classification (TC) particularly and text mining generally. Text preprocessing tools is to reduce multiple forms of the word to one form. In addition, text preprocessing techniques are provided a lot of significance and widely studied in machine learning. The basic phase in text classification involves preprocessing features, extracting relevant features against the features in a database. However, they have a great impact on reducing the time requirement and speed resources needed. The effect of the preprocessing tools on English text classification is an area of research. This paper provides an evaluation study of several preprocessing tools for English text classification. The study includes using the raw text, the tokenization, the stop words, and the stemmed. Two different methods chi-square and TF-IDF with cosine similarity score for feature extraction are used based on BBC English dataset. The Experimental results show that the text preprocessing effect on the feature extraction methods that enhances the performance of English text classification especially for small threshold values.

引用总数

被引用次数：162

2019202020212022202320241 10 26 41 53 28

学术搜索中的文章

An evaluation of preprocessing techniques for text classification

AI Kadhim - International Journal of Computer Science and …, 2018

被引用次数：162 相关文章所有 2 个版本