作者
Ankita Dhar, NiladriSekhar Dash, Kaushik Roy
发表日期
2017/9/15
研讨会论文
2017 3rd International Conference on Advances in Computing, Communication & Automation (ICACCA)(Fall)
页码范围
1-6
出版商
IEEE
简介
This paper explores the use of two similarity measures for categorizing Bangla text documents into their respective domains. Cosine Similarity and Euclidean Distance have been usedasthe similarity measures on the vector space model based on TF-IDF feature. The domains of interest are Business, State, Medical, Sports, and Science texts which are used as inputs for analysis. The recognition accuracy of 95.80% for Cosine Similarity and 95.20% for Euclidean Distanceare achieved on 1000 text documents. This confirms that unsupervised feature extraction technique may be treated as one of the useful methods for automatic text classification in Bangla (and for other Indian language documents), if input texts are not pre-classified based on certain predefined linguistic or statistical parameters. Comparative experiments on the dataset using several classification algorithm show that the distance measures perform …
引用总数
201820192020202120222023202435610553
学术搜索中的文章