Dataset discovery and exploration: A survey

NW Paton, J Chen, Z Wu - ACM Computing Surveys, 2023 - dl.acm.org
Data scientists are tasked with obtaining insights from data. However, suitable data is often
not immediately at hand, and there may be many potentially relevant datasets in a data lake …

Integrating data lake tables

A Khatiwada, R Shraga, W Gatterbauer… - Proceedings of the VLDB …, 2022 - dl.acm.org
We have made tremendous strides in providing tools for data scientists to discover new
tables useful for their analyses. But despite these advances, the proper integration of …

Tabular data augmentation for machine learning: Progress and prospects of embracing generative ai

L Cui, H Li, K Chen, L Shou, G Chen - arXiv preprint arXiv:2407.21523, 2024 - arxiv.org
Machine learning (ML) on tabular data is ubiquitous, yet obtaining abundant high-quality
tabular data for model training remains a significant obstacle. Numerous works have …

Data lakes: A survey of functions and systems

R Hai, C Koutras, C Quix… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
Data lakes are becoming increasingly prevalent for Big Data management and data
analytics. In contrast to traditional 'schema-on-write'approaches such as data warehouses …

Table discovery in data lakes: State-of-the-art and future directions

G Fan, J Wang, Y Li, RJ Miller - … of the 2023 International Conference on …, 2023 - dl.acm.org
Data discovery refers to a set of tasks that enable users and downstream applications to
explore and gain insights from massive collections of data sources such as data lakes. In …

Responsible data integration: Next-generation challenges

F Nargesian, A Asudeh, HV Jagadish - Proceedings of the 2022 …, 2022 - dl.acm.org
Data integration has been extensively studied by the data management community and is a
core task in the data pre-processing step of ML pipelines. When the integrated data is used …

Governor: Turning open government data portals into interactive databases

C Liu, A Usta, J Zhao, S Salihoglu - … of the 2023 CHI Conference on …, 2023 - dl.acm.org
The launch of open governmental data portals (OGDPs) has popularized the open data
movement of last decade. Although the amount of data in OGDPs is increasing, their …

Towards distribution-aware query answering in data markets

A Asudeh, F Nargesian - Proceedings of the VLDB Endowment, 2022 - dl.acm.org
Addressing the increasing demand for data exchange has led to the development of data
markets that facilitate transactional interactions between data buyers and data sellers. Still …

Fainder: A fast and accurate index for distribution-aware dataset search

L Behme, S Galhotra, K Beedkar, V Markl - Proceedings of the VLDB …, 2024 - dl.acm.org
Efficient data discovery is crucial in the era of data-driven decisionmaking. However, current
practices face significant challenges due to the intricacies of identifying datasets with …

A Survey on Data Markets

J Zhang, Y Bi, M Cheng, J Liu, K Ren, Q Sun… - arXiv preprint arXiv …, 2024 - arxiv.org
Data is the new oil of the 21st century. The growing trend of trading data for greater welfare
has led to the emergence of data markets. A data market is any mechanism whereby the …