Hybrid LSH: faster near neighbors reporting in high-dimensional space- 学术资源搜索

Hybrid LSH: faster near neighbors reporting in high-dimensional space

N Pham - arXiv preprint arXiv:1607.06179, 2016 - arxiv.org

arXiv preprint arXiv:1607.06179, 2016•arxiv.org

We study the $ r $-near neighbors reporting problem ($ r $-NN), ie, reporting\emph {all}
points in a high-dimensional point set $ S $ that lie within a radius $ r $ of a given query
point $ q $. Our approach builds upon on the locality-sensitive hashing (LSH) framework
due to its appealing asymptotic sublinear query time for near neighbor search problems in
high-dimensional space. A bottleneck of the traditional LSH scheme for solving $ r $-NN is
that its performance is sensitive to data and query-dependent parameters. On datasets …

We study the -near neighbors reporting problem (-NN), i.e., reporting \emph{all} points in a high-dimensional point set that lie within a radius of a given query point . Our approach builds upon on the locality-sensitive hashing (LSH) framework due to its appealing asymptotic sublinear query time for near neighbor search problems in high-dimensional space. A bottleneck of the traditional LSH scheme for solving -NN is that its performance is sensitive to data and query-dependent parameters. On datasets whose data distributions have diverse local density patterns, LSH with inappropriate tuning parameters can sometimes be outperformed by a simple linear search. In this paper, we introduce a hybrid search strategy between LSH-based search and linear search for -NN in high-dimensional space. By integrating an auxiliary data structure into LSH hash tables, we can efficiently estimate the computational cost of LSH-based search for a given query regardless of the data distribution. This means that we are able to choose the appropriate search strategy between LSH-based search and linear search to achieve better performance. Moreover, the integrated data structure is time efficient and fits well with many recent state-of-the-art LSH-based approaches. Our experiments on real-world datasets show that the hybrid search approach outperforms (or is comparable to) both LSH-based search and linear search for a wide range of search radii and data distributions in high-dimensional space.

arxiv.org

展开收起

被引用次数：10 相关文章所有 6 个版本

以上显示的是最相近的搜索结果。查看全部搜索结果

高级搜索

QQ 群

Hybrid LSH: faster near neighbors reporting in high-dimensional space

引用