Representation bias in data: A survey on identification and resolution techniques

N Shahbazi, Y Lin, A Asudeh, HV Jagadish - ACM Computing Surveys, 2023 - dl.acm.org
Data-driven algorithms are only as good as the data they work with, while datasets,
especially social data, often fail to represent minorities adequately. Representation Bias in …

Finding related tables in data lakes for interactive data science

Y Zhang, ZG Ives - Proceedings of the 2020 ACM SIGMOD International …, 2020 - dl.acm.org
Many modern data science applications build on data lakes, schema-agnostic repositories
of data files and data products that offer limited organization and management capabilities …

[图书][B] Data profiling

Z Abedjan, L Golab, F Naumann, T Papenbrock - 2019 - Springer
Data profiling refers to the activity of collecting data about data,{ie}, metadata. Most IT
professionals and researchers who work with data have engaged in data profiling, at least …

Data dependencies for query optimization: a survey

J Kossmann, T Papenbrock, F Naumann - The VLDB Journal, 2022 - Springer
Effective query optimization is a core feature of any database management system. While
most query optimization techniques make use of simple metadata, such as cardinalities and …

Efficient discovery of approximate dependencies

S Kruse, F Naumann - Proceedings of the VLDB Endowment, 2018 - dl.acm.org
Functional dependencies (FDs) and unique column combinations (UCCs) form a valuable
ingredient for many data management tasks, such as data cleaning, schema recovery, and …

Data lakes: A survey of functions and systems

R Hai, C Koutras, C Quix… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
Data lakes are becoming increasingly prevalent for Big Data management and data
analytics. In contrast to traditional 'schema-on-write'approaches such as data warehouses …

Discovery of approximate (and exact) denial constraints

EHM Pena, EC De Almeida, F Naumann - Proceedings of the VLDB …, 2019 - dl.acm.org
Maintaining data consistency is known to be hard. Recent approaches have relied on
integrity constraints to deal with the problem-correct and complete constraints naturally work …

Multi clustering recommendation system for fashion retail

P Bellini, LAI Palesi, P Nesi, G Pantaleo - Multimedia Tools and …, 2023 - Springer
Fashion retail has a large and ever-increasing popularity and relevance, allowing customers
to buy anytime finding the best offers and providing satisfactory experiences in the shops …

Designing succinct secondary indexing mechanism by exploiting column correlations

Y Wu, J Yu, Y Tian, R Sidle, R Barber - Proceedings of the 2019 …, 2019 - dl.acm.org
Database administrators construct secondary indexes on data tables to accelerate query
processing in relational database management systems (RDBMSs). These indexes are built …

[PDF][PDF] RENUVER: A Missing Value Imputation Algorithm based on Relaxed Functional Dependencies.

B Breve, L Caruccio, V Deufemia, G Polese - EDBT, 2022 - openproceedings.org
ABSTRACT A missing value represents a piece of incomplete information that might appear
in database instances. Data imputation is the problem of filling missing values by means of …