In malicious URLs detection, traditional classifiers are challenged because the data volume is huge, patterns are changing over time, and the correlations among features are …
W Wu, B Li, L Chen, J Gao… - IEEE Transactions on …, 2020 - ieeexplore.ieee.org
Data similarity (or distance) computation is a fundamental research topic which underpins many high-level applications based on similarity measures in machine learning and data …
H Liu, W Zhou, Z Wu, S Zhang, G Li… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
Learning to hash is of particular interest in information retrieval for large-scale data due to its high efficiency and effectiveness. Most studies in hashing concentrate on constructing new …
There are significant benefits to serve deep learning models from relational databases. First, features extracted from databases do not need to be transferred to any decoupled deep …
Due to advances in mobile devices and sensors, there has been an increasing interest in the analysis of multivariate time series. Identifying similar time series is a core subroutine in …
Nearest neighbor (NN) search is inherently computationally expensive in high-dimensional spaces due to the curse of dimensionality. As a well-known solution, locality-sensitive …
Authorship attribution aims at identifying the original author of an anonymous text from a given set of candidate authors and has a wide range of applications. The main challenge in …
Min-Hash is a popular technique for efficiently estimating the Jaccard similarity of binary sets. Consistent Weighted Sampling (CWS) generalizes the Min-Hash scheme to sketch …
Y Wang - arXiv preprint arXiv:2204.07922, 2022 - arxiv.org
Similarity query is the family of queries based on some similarity metrics. Unlike the traditional database queries which are mostly based on value equality, similarity queries aim …