[PDF][PDF] Web site keyword selection method by considering semantic similarity based on word2vec

D Lee, K Kim - The Journal of Society for e-Business Studies, 2018 - koreascience.kr
The Journal of Society for e-Business Studies, 2018koreascience.kr
Extracting keywords representing documents is very important because it can be used for
automated services such as document search, classification, recommendation system as
well as quickly transmitting document information. However, when extracting keywords
based on the frequency of words appearing in a web site documents and graph algorithms
based on the co-occurrence of words, the problem of containing various words that are not
related to the topic potentially in the web page structure, There is a difficulty in extracting the …
Abstract
Extracting keywords representing documents is very important because it can be used for automated services such as document search, classification, recommendation system as well as quickly transmitting document information. However, when extracting keywords based on the frequency of words appearing in a web site documents and graph algorithms based on the co-occurrence of words, the problem of containing various words that are not related to the topic potentially in the web page structure, There is a difficulty in extracting the semantic keyword due to the limit of the performance of the Korean tokenizer. In this paper, we propose a method to select candidate keywords based on semantic similarity, and solve the problem that semantic keyword can not be extracted and the accuracy of Korean tokenizer analysis is poor. Finally, we use the technique of extracting final semantic keywords through filtering process to remove inconsistent keywords. Experimental results through real web pages of small business show that the performance of the proposed method is improved by 34.52% over the statistical similarity based keyword selection technique. Therefore, it is confirmed that the performance of extracting keywords from documents is improved by considering semantic similarity between words and removing inconsistent keywords.
koreascience.kr
以上显示的是最相近的搜索结果。 查看全部搜索结果