Data management in machine learning: Challenges, techniques, and systems

A Kumar, M Boehm, J Yang - Proceedings of the 2017 ACM International …, 2017 - dl.acm.org
Large-scale data analytics using statistical machine learning (ML), popularly called
advanced analytics, underpins many modern data-driven applications. The data …

How large language models will disrupt data management

RC Fernandez, AJ Elmore, MJ Franklin… - Proceedings of the …, 2023 - dl.acm.org
Large language models (LLMs), such as GPT-4, are revolutionizing software's ability to
understand, process, and synthesize language. The authors of this paper believe that this …

Big data management challenges in health research—a literature review

X Wang, C Williams, ZH Liu… - Briefings in …, 2019 - academic.oup.com
Big data management for information centralization (ie making data of interest findable) and
integration (ie making related data connectable) in health research is a defining challenge in …

Recent trends in knowledge graphs: theory and practice

S Tiwari, FN Al-Aswadi, D Gaurav - Soft Computing, 2021 - Springer
With the extensive growth of data that has been joined with the thriving development of the
Internet in this century, finding or getting valuable information and knowledge from these …

Data integration: After the teenage years

B Golshan, A Halevy, G Mihaila, WC Tan - Proceedings of the 36th ACM …, 2017 - dl.acm.org
The field of data integration has expanded significantly over the years, from providing a
uniform query and update interface to structured databases within an enterprise to the ability …

Autoknow: Self-driving knowledge collection for products of thousands of types

XL Dong, X He, A Kan, X Li, Y Liang, J Ma… - Proceedings of the 26th …, 2020 - dl.acm.org
Can one build a knowledge graph (KG) for all products in the world? Knowledge graphs
have firmly established themselves as valuable sources of information for search and …

Data management for machine learning: A survey

C Chai, J Wang, Y Luo, Z Niu… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org
Machine learning (ML) has widespread applications and has revolutionized many
industries, but suffers from several challenges. First, sufficient high-quality training data is …

Acekg: A large-scale knowledge graph for academic data mining

R Wang, Y Yan, J Wang, Y Jia, Y Zhang… - Proceedings of the 27th …, 2018 - dl.acm.org
Most existing knowledge graphs (KGs) in academic domains suffer from problems of
insufficient multi-relational information, name ambiguity and improper data format for large …

Saga: A platform for continuous construction and serving of knowledge at scale

IF Ilyas, T Rekatsinas, V Konda, J Pound, X Qi… - Proceedings of the …, 2022 - dl.acm.org
We introduce Saga, a next-generation knowledge construction and serving platform for
powering knowledge-based applications at industrial scale. Saga follows a hybrid batch …

VolcanoML: speeding up end-to-end AutoML via scalable search space decomposition

Y Li, Y Shen, W Zhang, C Zhang, B Cui - The VLDB Journal, 2023 - Springer
End-to-end AutoML has attracted intensive interests from both academia and industry which
automatically searches for ML pipelines in a space induced by feature engineering …