查看文章

Classification of text documents through distance measurement: an experiment with multi-domain bangla text documents

作者

Ankita Dhar, NiladriSekhar Dash, Kaushik Roy

发表日期

2017/9/15

研讨会论文

2017 3rd International Conference on Advances in Computing, Communication & Automation (ICACCA)(Fall)

页码范围

1-6

出版商

IEEE

简介

This paper explores the use of two similarity measures for categorizing Bangla text documents into their respective domains. Cosine Similarity and Euclidean Distance have been usedasthe similarity measures on the vector space model based on TF-IDF feature. The domains of interest are Business, State, Medical, Sports, and Science texts which are used as inputs for analysis. The recognition accuracy of 95.80% for Cosine Similarity and 95.20% for Euclidean Distanceare achieved on 1000 text documents. This confirms that unsupervised feature extraction technique may be treated as one of the useful methods for automatic text classification in Bangla (and for other Indian language documents), if input texts are not pre-classified based on certain predefined linguistic or statistical parameters. Comparative experiments on the dataset using several classification algorithm show that the distance measures perform …

引用总数

被引用次数：37

20182019202020212022202320243 5 6 10 5 5 3

学术搜索中的文章

Classification of text documents through distance measurement: An experiment with multi-domain Bangla text documents

A Dhar, NS Dash, K Roy - 2017 3rd International Conference on Advances in …, 2017

被引用次数：37 相关文章所有 3 个版本