Can you rely on your model evaluation? improving model evaluation with synthetic test data

B van Breugel, N Seedat, F Imrie… - Advances in Neural …, 2024 - proceedings.neurips.cc
Evaluating the performance of machine learning models on diverse and underrepresented
subgroups is essential for ensuring fairness and reliability in real-world applications …

Navigating data-centric artificial intelligence with DC-Check: Advances, challenges, and opportunities

N Seedat, F Imrie… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
Data-centric AI is an emerging paradigm that emphasizes the critical role of data in real-
world machine learning (ML) systems—as a complement to model development. However …

TRIAGE: Characterizing and auditing training data for improved regression

N Seedat, J Crabbé, Z Qian… - Advances in Neural …, 2024 - proceedings.neurips.cc
Data quality is crucial for robust machine learning algorithms, with the recent interest in data-
centric AI emphasizing the importance of training data characterization. However, current …

Query-dependent prompt evaluation and optimization with offline inverse RL

H Sun, A Hüyük, M van der Schaar - The Twelfth International …, 2023 - openreview.net
In this study, we aim to enhance the arithmetic reasoning ability of Large Language Models
(LLMs) through zero-shot prompt optimization. We identify a previously overlooked objective …

Reimagining synthetic tabular data generation through data-centric AI: A comprehensive benchmark

L Hansen, N Seedat… - Advances in Neural …, 2023 - proceedings.neurips.cc
Synthetic data serves as an alternative in training machine learning models, particularly
when real-world data is limited or inaccessible. However, ensuring that synthetic data …

DMLR: Data-centric Machine Learning Research--Past, Present and Future

L Oala, M Maskey, L Bat-Leah, A Parrish… - arXiv preprint arXiv …, 2023 - arxiv.org
Drawing from discussions at the inaugural DMLR workshop at ICML 2023 and meetings
prior, in this report we outline the relevance of community engagement and infrastructure …

Unified fair federated learning for digital healthcare

F Zhang, Z Shuai, K Kuang, F Wu, Y Zhuang, J Xiao - Patterns, 2024 - cell.com
Federated learning (FL) is a promising approach for healthcare institutions to train high-
quality medical models collaboratively while protecting sensitive data privacy. However, FL …

Curated llm: Synergy of llms and data curation for tabular augmentation in ultra low-data regimes

N Seedat, N Huynh, B van Breugel… - arXiv preprint arXiv …, 2023 - arxiv.org
Machine Learning (ML) in low-data settings remains an underappreciated yet crucial
problem. This challenge is pronounced in low-to-middle income countries where access to …

Dissecting sample hardness: A fine-grained analysis of hardness characterization methods for data-centric AI

N Seedat, F Imrie, M van der Schaar - arXiv preprint arXiv:2403.04551, 2024 - arxiv.org
Characterizing samples that are difficult to learn from is crucial to developing highly
performant ML models. This has led to numerous Hardness Characterization Methods …

Selective Learning: Towards Robust Calibration with Dynamic Regularization

Z Han, Y Yang, C Zhang, L Zhang, JT Zhou… - arXiv preprint arXiv …, 2024 - arxiv.org
Miscalibration in deep learning refers to there is a discrepancy between the predicted
confidence and performance. This problem usually arises due to the overfitting problem …