作者
Mostafa Keikha, Ahmad Khonsari, Farhad Oroumchian
发表日期
2009/1/1
期刊
Knowledge-Based Systems
卷号
22
期号
1
页码范围
67-71
出版商
Elsevier
简介
There are three factors involved in text classification. These are classification model, similarity measure and document representation model. In this paper, we will focus on document representation and demonstrate that the choice of document representation has a profound impact on the quality of the classifier. In our experiments, we have used the centroid-based text classifier, which is a simple and robust text classification scheme. We will compare four different types of document representations: N-grams, Single terms, phrases and RDR which is a logic-based document representation. The N-gram representation is a string-based representation with no linguistic processing. The Single term approach is based on words with minimum linguistic processing. The phrase approach is based on linguistically formed phrases and single words. The RDR is based on linguistic processing and representing documents as a …
引用总数
201020112012201320142015201620172018201920202021202220235542272329311
学术搜索中的文章