作者
Feng Zou, Fu Lee Wang, Xiaotie Deng, Song Han, Lu Sheng Wang
发表日期
2006/4/16
期刊
Proceedings of the 5th WSEAS international conference on Applied computer science
页码范围
1010-1015
出版商
World Scientific and Engineering Academy and Society (WSEAS)
简介
In modern information retrieval systems, effective indexing can be achieved by removal of stop words. Till now many stop word lists have been developed for English language. However, no standard stop word list has been constructed for Chinese language yet. With the fast development of information retrieval in Chinese language, exploring Chinese stop word lists becomes critical. In this paper, to save the time and release the burden of manual stop word selection, we propose an automatic aggregated methodology based on statistical and information models for extraction of a stop word list in Chinese language. Result analysis shows that our stop list is comparable with a general English stop word list, and our list is much more general than other Chinese stop lists as well. Our stop word extraction algorithm is a promising technique, which saves the time for manual generation and constructs a standard. It could be applied into other languages in the future.
引用总数
200620072008200920102011201220132014201520162017201820192020202120222023202431558911691196452
学术搜索中的文章
F Zou, FL Wang, X Deng, S Han, LS Wang - Proceedings of the 5th WSEAS international …, 2006