作者
Sunita Sarkar, Arindam Roy, BS Purkayastha
发表日期
2014/6
期刊
Int. J. Nat. Lang. Comput.(IJNLC)
卷号
3
期号
3
简介
The volume of digitized text documents on the web have been increasing rapidly. As there is huge collection of data on the web there is a need for grouping (clustering) the documents into clusters for speedy information retrieval. Clustering of documents is collection of documents into groups such that the documents within each group are similar to each other and not to documents of other groups. Quality of clustering result depends greatly on the representation of text and the clustering algorithm. This paper presents a comparative analysis of three algorithms namely K-means, Particle swarm Optimization (PSO) and hybrid PSO+ K-means algorithm for clustering of text documents using WordNet. The common way of representing a text document is bag of terms. The bag of terms representation is often unsatisfactory as it does not exploit the semantics. In this paper, texts are represented in terms of synsets corresponding to a word. Bag of terms data representation of text is thus enriched with synonyms from WordNet. K-means, Particle Swarm Optimization (PSO) and hybrid PSO+ K-means algorithms are applied for clustering of text in Nepali language. Experimental evaluation is performed by using intra cluster similarity and inter cluster similarity.
引用总数
201520162017201820192020202120222023232123244