作者
Mostafa Keikha, Narjes Sharif Razavian, Farhad Oroumchian, Hassan Seyed Razi
发表日期
2008
图书
Survey of text mining II: Clustering, classification, and retrieval
页码范围
219-232
出版商
Springer London
简介
There are three factors involved in text classification: the classification model, the similarity measure, and the document representation. In this chapter, we will focus on document representation and demonstrate that the choice of document representation has a profound impact on the quality of the classification.We will also show that the text quality affects the choice of document representation. In our experiments we have used the centroid-based classification, which is a simple and robust text classi-fication scheme. We will compare four different types of document representation: N-grams, single terms, phrases, and a logic-based document representation called RDR. The N-gram representation is a string-based representation with no linguistic processing. The single-term approach is based on words with minimum linguistic processing. The phrase approach is based on linguistically formed phrases and single …
引用总数
200720082009201020112012201320142015201620172018201920202021202220232321512113111
学术搜索中的文章
M Keikha, NS Razavian, F Oroumchian, HS Razi - Survey of text mining II: Clustering, classification, and …, 2008