查看文章

vldb.org 中的 [PDF]

n-gram/2l: A space and time efficient two-level n-gram inverted index structure

作者

Min-Soo Kim, Kyu-Young Whang, Jae-Gil Lee, Min-Jae Lee

发表日期

2005/8/30

图书

Proceedings of the 31st international conference on Very large data bases

页码范围

325-336

简介

The n-gram inverted index has two major advantages: language-neutral and error-tolerant. Due to these advantages, it has been widely used in information retrieval or in similar sequence matching for DNA and protein databases. Nevertheless, the n-gram inverted index also has drawbacks: the size tends to be very large, and the performance of queries tends to be bad. In this paper, we propose the two-level n-gram inverted index (simply, the n-gram/2L index) that significantly reduces the size and improves the query performance while preserving the advantages of the n-gram inverted index. The proposed index eliminates the redundancy of the position information that exists in the n-gram inverted index. The proposed index is constructed in two steps: 1) extracting subsequences of length m from documents and 2) extracting n-grams from those subsequences. We formally prove that this two-step construction is identical to the relational normalization process that removes the redundancy caused by a non-trivial multivalued dependency. The n-gram/2L index has excellent properties: 1) it significantly reduces the size and improves the performance compared with the n-gram inverted index with these improvements becoming more marked as the database size gets larger; 2) the query processing time increases only very slightly as the query length gets longer. Experimental results using databases of 1 GBytes show that the size of the n-gram/2L index is reduced by up to 1.9∼ 2.7 times and, at the same time, the query performance is improved by up to 13.1 times compared with those of the n-gram inverted index.

Permission to copy without fee all or …

引用总数

被引用次数：130

2006200720082009201020112012201320142015201620172018201920202021202220234 7 12 13 7 14 15 8 13 6 9 7 2 3 4 2 1 1

学术搜索中的文章

n-gram/2l: A space and time efficient two-level n-gram inverted index structure

MS Kim, KY Whang, JG Lee, MJ Lee - Proceedings of the 31st international conference on …, 2005

被引用次数：127 相关文章所有 18 个版本

n-Gram/2L: A Space and Time Efficient Two-Level n-Gram Inverted Index Structure*

W Kyu-Young - Journal of KIISE: Databases, 2006

被引用次数：3 相关文章所有 3 个版本