An overview of end-to-end entity resolution for big data

V Christophides, V Efthymiou, T Palpanas… - ACM Computing …, 2020 - dl.acm.org
One of the most critical tasks for improving data quality and increasing the reliability of data
analytics is Entity Resolution (ER), which aims to identify different descriptions that refer to …

Blocking and filtering techniques for entity resolution: A survey

G Papadakis, D Skoutas, E Thanos… - ACM Computing Surveys …, 2020 - dl.acm.org
Entity Resolution (ER), a core task of Data Integration, detects different entity profiles that
correspond to the same real-world object. Due to its inherently quadratic complexity, a series …

Approximate nearest neighbor search on high dimensional data—experiments, analyses, and improvement

W Li, Y Zhang, Y Sun, W Wang, M Li… - … on Knowledge and …, 2019 - ieeexplore.ieee.org
Nearest neighbor search is a fundamental and essential operation in applications from
many domains, such as databases, machine learning, multimedia, and computer vision …

When large language models meet vector databases: A survey

Z Jing, Y Su, Y Han, B Yuan, H Xu, C Liu… - arXiv preprint arXiv …, 2024 - arxiv.org
This survey explores the synergistic potential of Large Language Models (LLMs) and Vector
Databases (VecDBs), a burgeoning but rapidly evolving research area. With the proliferation …

Netlsd: hearing the shape of a graph

A Tsitsulin, D Mottin, P Karras, A Bronstein… - Proceedings of the 24th …, 2018 - dl.acm.org
Comparison among graphs is ubiquitous in graph analytics. However, it is a hard task in
terms of the expressiveness of the employed similarity measure and the efficiency of its …

Clustering with qualitative information

M Charikar, V Guruswami, A Wirth - Journal of Computer and System …, 2005 - Elsevier
We consider the problem of clustering a collection of elements based on pairwise judgments
of similarity and dissimilarity. Bansal et al.(in: Proceedings of 43rd FOCS, 2002, pp. 238 …

{µTune}:{Auto-Tuned} Threading for {OLDI} Microservices

A Sriraman, TF Wenisch - … Symposium on Operating Systems Design and …, 2018 - usenix.org
Modern On-Line Data Intensive (OLDI) applications have evolved from monolithic systems to
instead comprise numerous, distributed microservices interacting via Remote Procedure …

MD-HBase: A scalable multi-dimensional data infrastructure for location aware services

S Nishimura, S Das, D Agrawal… - 2011 IEEE 12th …, 2011 - ieeexplore.ieee.org
The ubiquity of location enabled devices has resulted in a wide proliferation of location
based applications and services. To handle the growing scale, database management …

μ suite: a benchmark suite for microservices

A Sriraman, TF Wenisch - 2018 ieee international symposium …, 2018 - ieeexplore.ieee.org
Modern On-Line Data Intensive (OLDI) applications have evolved from monolithic systems to
instead comprise numerous, distributed microservices interacting via Remote Procedure …

Towards efficient index construction and approximate nearest neighbor search in high-dimensional spaces

X Zhao, Y Tian, K Huang, B Zheng, X Zhou - Proceedings of the VLDB …, 2023 - dl.acm.org
The approximate nearest neighbor (ANN) search in high-dimensional spaces is a
fundamental but computationally very expensive problem. Many methods have been …