作者
Didik Dwi Prasetya, Aji Prasetya Wibawa, Tsukasa Hirashima
发表日期
2018/3/31
期刊
International Journal of Advances in Intelligent Informatics
卷号
4
期号
1
页码范围
61-69
简介
The recent information problematic issue is the rapid data growth [1]. Text similarity measurement is a text mining approach that could be overcome this overwhelming problem. Finding the similarity between words is a primary stage for sentence, paragraph and document similarities [2]. Text similarity approach may alleviate people on finding relevant information. This is the backbone of successful text mining operations such as searching and information retrieval (IR), text classification, information extraction (IE), document clustering [3], sentiment analysis, machine translation, text summarization, and natural language processing (NLP).
Lexical and semantic similarity words is an essential element of sentence, paragraph and document similarity measurement [2]. Lexical similarity a degree of two given string are similar in its character sequence. While the score is one (1), means the words are 100% lexically identic. In contrast, zero (0) indicates that there is no common word between given strings. On the other hand, semantic similarity represents the likeness among text and document on the basis of their contextual meaning. For example, the pair of “book” and “cook” have a high lexical similarity, but they are not semantically related. The pair of “car” and “wheel” that seems have no lexical similarity, but they are very semantically related as they are automotive-related terms.
引用总数
20182019202020212022202320243561326147
学术搜索中的文章
DD Prasetya, AP Wibawa, T Hirashima - International Journal of Advances in Intelligent …, 2018