查看文章

semanticscholar.org 中的 [PDF]

The performance of text similarity algorithms

作者

Didik Dwi Prasetya, Aji Prasetya Wibawa, Tsukasa Hirashima

发表日期

2018/3/31

期刊

International Journal of Advances in Intelligent Informatics

卷号

期号

页码范围

61-69

简介

The recent information problematic issue is the rapid data growth [1]. Text similarity measurement is a text mining approach that could be overcome this overwhelming problem. Finding the similarity between words is a primary stage for sentence, paragraph and document similarities [2]. Text similarity approach may alleviate people on finding relevant information. This is the backbone of successful text mining operations such as searching and information retrieval (IR), text classification, information extraction (IE), document clustering [3], sentiment analysis, machine translation, text summarization, and natural language processing (NLP).

Lexical and semantic similarity words is an essential element of sentence, paragraph and document similarity measurement [2]. Lexical similarity a degree of two given string are similar in its character sequence. While the score is one (1), means the words are 100% lexically identic. In contrast, zero (0) indicates that there is no common word between given strings. On the other hand, semantic similarity represents the likeness among text and document on the basis of their contextual meaning. For example, the pair of “book” and “cook” have a high lexical similarity, but they are not semantically related. The pair of “car” and “wheel” that seems have no lexical similarity, but they are very semantically related as they are automotive-related terms.

引用总数

被引用次数：74

20182019202020212022202320243 5 6 13 26 14 7

学术搜索中的文章

The performance of text similarity algorithms

DD Prasetya, AP Wibawa, T Hirashima - International Journal of Advances in Intelligent …, 2018

被引用次数：74 相关文章所有 10 个版本