Visualization of very large high-dimensional data sets as minimum spanning trees

D Probst, JL Reymond - Journal of Cheminformatics, 2020 - Springer
The chemical sciences are producing an unprecedented amount of large, high-dimensional
data sets containing chemical structures and associated properties. However, there are …

A review for weighted minhash algorithms

W Wu, B Li, L Chen, J Gao… - IEEE Transactions on …, 2020 - ieeexplore.ieee.org
Data similarity (or distance) computation is a fundamental research topic which underpins
many high-level applications based on similarity measures in machine learning and data …

Evolution of biosequence search algorithms: a brief survey

G Kucherov - Bioinformatics, 2019 - academic.oup.com
Motivation Although modern high-throughput biomolecular technologies produce various
types of data, biosequence data remain at the core of bioinformatic analyses. However …

Hashing-accelerated graph neural networks for link prediction

W Wu, B Li, C Luo, W Nejdl - Proceedings of the Web Conference 2021, 2021 - dl.acm.org
Networks are ubiquitous in the real world. Link prediction, as one of the key problems for
network-structured data, aims to predict whether there exists a link between two nodes. The …

An LSH-based offloading method for IoMT services in integrated cloud-edge environment

X Xu, Q Huang, Y Zhang, S Li, L Qi, W Dou - ACM Transactions on …, 2021 - dl.acm.org
Benefiting from the massive available data provided by Internet of multimedia things (IoMT),
enormous intelligent services requiring information of various types to make decisions are …

Geo-graph-indistinguishability: Protecting location privacy for LBS over road networks

S Takagi, Y Cao, Y Asano, M Yoshikawa - … , SC, USA, July 15–17, 2019 …, 2019 - Springer
Abstract In recent years, Geo-Indistinguishability (GeoI) has been increasingly explored for
protecting location privacy in location-based services (LBSs). GeoI is considered a …

The duality of similarity and metric spaces

O Rozinek, J Mareš - Applied Sciences, 2021 - mdpi.com
We introduce a new mathematical basis for similarity space. For the first time, we describe
the relationship between distance and similarity from set theory. Then, we derive generally …

Locality sensitive hashing in fourier frequency domain for soft set containment search

I Roy, R Agarwal, S Chakrabarti… - Advances in Neural …, 2023 - proceedings.neurips.cc
In many search applications related to passage retrieval, text entailment, and subgraph
search, the query and each'document'is a set of elements, with a document being relevant if …

Weighted minwise hashing beats linear sketching for inner product estimation

A Bessa, M Daliri, J Freire, C Musco, C Musco… - Proceedings of the …, 2023 - dl.acm.org
We present a new approach for independently computing compact sketches that can be
used to approximate the inner product between pairs of high-dimensional vectors. Based on …

Variance reduction in feature hashing using MLE and control variate method

BD Verma, R Pratap, M Thakur - Machine Learning, 2022 - Springer
The feature hashing algorithm introduced by Weinberger et al. is a popular dimensionality
reduction algorithm that compresses high dimensional data points into low dimensional data …