Query optimizers rely on accurate cardinality estimates to produce good execution plans. Despite decades of research, existing cardinality estimators are inaccurate for complex …
Ensuring good data quality in biomedical sciences is crucial for reliable research outcomes, particularly as precision medicine continues to gain prominence. Missing values …
Data imputation and forecasting are the major research areas in environmental data engineering. Solving those critical issues has an immense impact on air pollution …
Biobanks that collect deep phenotypic and genomic data across many individuals have emerged as a key resource in human genetics. However, phenotypes in biobanks are often …
IF Ilyas, T Rekatsinas - ACM Journal of Data and Information Quality …, 2022 - dl.acm.org
The last few years witnessed significant advances in building automated or semi-automated data quality, data cleaning and data integration systems powered by machine learning (ML) …
We introduce Saga, a next-generation knowledge construction and serving platform for powering knowledge-based applications at industrial scale. Saga follows a hybrid batch …
Given a dataset with incomplete data (eg, missing values), training a machine learning model over the incomplete data requires two steps. First, it requires a data-effective step that …
H Zhang, Y Dong, C Xiao… - Proceedings of the 2024 …, 2024 - aclanthology.org
This paper explores the utilization of LLMs for data preprocessing (DP), a crucial step in the data mining pipeline that transforms raw data into a clean format. We instruction-tune local …
Organizations are increasingly relying on data to support decisions. When data contains private and sensitive information, the data owner often desires to publish a synthetic …