Re-thinking data strategy and integration for artificial intelligence: concepts, opportunities, and challenges

A Aldoseri, KN Al-Khalifa, AM Hamouda - Applied Sciences, 2023 - mdpi.com
The use of artificial intelligence (AI) is becoming more prevalent across industries such as
healthcare, finance, and transportation. Artificial intelligence is based on the analysis of …

Data lake management: challenges and opportunities

F Nargesian, E Zhu, RJ Miller, KQ Pu… - Proceedings of the VLDB …, 2019 - dl.acm.org
The ubiquity of data lakes has created fascinating new challenges for data management
research. In this tutorial, we review the state-of-the-art in data management for data lakes …

Data collection and quality challenges in deep learning: A data-centric ai perspective

SE Whang, Y Roh, H Song, JG Lee - The VLDB Journal, 2023 - Springer
Data-centric AI is at the center of a fundamental shift in software engineering where machine
learning becomes the new software, powered by big data and computing infrastructure …

A survey on data collection for machine learning: a big data-ai integration perspective

Y Roh, G Heo, SE Whang - IEEE Transactions on Knowledge …, 2019 - ieeexplore.ieee.org
Data collection is a major bottleneck in machine learning and an active research topic in
multiple communities. There are largely two reasons data collection has recently become a …

On data lake architectures and metadata management

P Sawadogo, J Darmont - Journal of Intelligent Information Systems, 2021 - Springer
Over the past two decades, we have witnessed an exponential increase of data production
in the world. So-called big data generally come from transactional systems, and even more …

Google Dataset Search: Building a search engine for datasets in an open Web ecosystem

D Brickley, M Burgess, N Noy - The world wide web conference, 2019 - dl.acm.org
There are thousands of data repositories on the Web, providing access to millions of
datasets. National and regional governments, scientific publishers and consortia …

Automated machine learning: State-of-the-art and open challenges

R Elshawi, M Maher, S Sakr - arXiv preprint arXiv:1906.02287, 2019 - arxiv.org
With the continuous and vast increase in the amount of data in our digital world, it has been
acknowledged that the number of knowledgeable data scientists can not scale to address …

[HTML][HTML] Dataset search: a survey

A Chapman, E Simperl, L Koesten, G Konstantinidis… - The VLDB Journal, 2020 - Springer
Generating value from data requires the ability to find, access and make sense of datasets.
There are many efforts underway to encourage data sharing and reuse, from scientific …

Aurum: A data discovery system

RC Fernandez, Z Abedjan, F Koko… - 2018 IEEE 34th …, 2018 - ieeexplore.ieee.org
Organizations face a data discovery problem when their analysts spend more time looking
for relevant data than analyzing it. This problem has become commonplace in modern …

Automating large-scale data quality verification

S Schelter, D Lange, P Schmidt, M Celikel… - Proceedings of the …, 2018 - dl.acm.org
Modern companies and institutions rely on data to guide every single business process and
decision. Missing or incorrect information seriously compromises any decision process …