作者
J-Y Nie, Fuji Ren
发表日期
1999
期刊
Information Processing & Management
卷号
35
期号
4
页码范围
443-462
出版商
Elsevier Science Publishing Company, Inc.
简介
Several experimental studies have been conducted in order to compare words and n-grams with respect to their performances in Chinese Information Retrieval (IR). These studies claim that n-grams (in particular bigrams) perform as well as, or even better than, words. In this paper, we propose a relaxed segmentation process for Chinese which extracts not only the longest words, but also all the short words implied. Special rules are also designed to recognize and normalize special words such as proper names and nominal pre-determiners. Our experiments show that IR based on this segmentation gives a slightly higher effectiveness than bigrams. In addition, it requires less time and space for document and query processing. We also tested combinations of words with bigrams in IR and using top-ranked documents for query expansion. These techniques proved to be effective.
引用总数
20002001200220032004200520062007200820092010201120122013201420152016201720182019202020212022202311712141271529742421111
学术搜索中的文章