A generic method for accelerating LSH-based similarity join processing

O Jafari, P Maurya, P Nagarkar, KM Islam… - arXiv preprint arXiv …, 2021 - arxiv.org

Finding nearest neighbors in high-dimensional spaces is a fundamental operation in many
diverse application domains. Locality Sensitive Hashing (LSH) is one of the most popular …

被引用次数：108 相关文章所有 3 个版本

[HTML] sciencedirect.com

[HTML][HTML] Improving malicious URLs detection via feature engineering: Linear and nonlinear space transformation methods

T Li, G Kou, Y Peng - Information Systems, 2020 - Elsevier

In malicious URLs detection, traditional classifiers are challenged because the data volume
is huge, patterns are changing over time, and the correlations among features are …

被引用次数：186 相关文章所有 2 个版本

[PDF] arxiv.org

A review for weighted minhash algorithms

W Wu, B Li, L Chen, J Gao… - IEEE Transactions on …, 2020 - ieeexplore.ieee.org

Data similarity (or distance) computation is a fundamental research topic which underpins
many high-level applications based on similarity measures in machine learning and data …

被引用次数：45 相关文章所有 8 个版本

Refining codes for locality sensitive hashing

H Liu, W Zhou, Z Wu, S Zhang, G Li… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org

Learning to hash is of particular interest in information retrieval for large-scale data due to its
high efficiency and effectiveness. Most studies in hashing concentrate on constructing new …

被引用次数：8 相关文章所有 6 个版本

[PDF] arxiv.org

Serving deep learning models with deduplication from relational databases

L Zhou, J Chen, A Das, H Min, L Yu, M Zhao… - arXiv preprint arXiv …, 2022 - arxiv.org

There are significant benefits to serve deep learning models from relational databases. First,
features extracted from databases do not need to be transferred to any decoupled deep …

被引用次数：19 相关文章所有 9 个版本

A fast LSH-based similarity search method for multivariate time series

C Yu, L Luo, LLH Chan, T Rakthanmanon… - Information Sciences, 2019 - Elsevier

Due to advances in mobile devices and sensors, there has been an increasing interest in
the analysis of multivariate time series. Identifying similar time series is a core subroutine in …

被引用次数：38 相关文章所有 3 个版本

[PDF] arxiv.org

PM-LSH: a fast and accurate in-memory framework for high-dimensional approximate NN and closest pair search

B Zheng, X Zhao, L Weng, QVH Nguyen, H Liu… - The VLDB Journal, 2022 - Springer

Nearest neighbor (NN) search is inherently computationally expensive in high-dimensional
spaces due to the curse of dimensionality. As a well-known solution, locality-sensitive …

被引用次数：12 相关文章所有 8 个版本

[PDF] ieee.org

An effective and scalable framework for authorship attribution query processing

R Sarwar, C Yu, N Tungare, K Chitavisutthivong… - IEEE …, 2018 - ieeexplore.ieee.org

Authorship attribution aims at identifying the original author of an anonymous text from a
given set of candidate authors and has a wide range of applications. The main challenge in …

被引用次数：22 相关文章所有 9 个版本

[PDF] arxiv.org

Improved consistent weighted sampling revisited

W Wu, B Li, L Chen, C Zhang… - IEEE Transactions on …, 2018 - ieeexplore.ieee.org

Min-Hash is a popular technique for efficiently estimating the Jaccard similarity of binary
sets. Consistent Weighted Sampling (CWS) generalizes the Min-Hash scheme to sketch …

被引用次数：28 相关文章所有 4 个版本

[PDF] arxiv.org

A Survey on Efficient Processing of Similarity Queries over Neural Embeddings

Y Wang - arXiv preprint arXiv:2204.07922, 2022 - arxiv.org

Similarity query is the family of queries based on some similarity metrics. Unlike the
traditional database queries which are mostly based on value equality, similarity queries aim …

被引用次数：7 相关文章所有 2 个版本

高级搜索

QQ 群