Data-iq: Characterizing subgroups with heterogeneous outcomes in tabular data

B van Breugel, N Seedat, F Imrie… - Advances in Neural …, 2024 - proceedings.neurips.cc

Evaluating the performance of machine learning models on diverse and underrepresented
subgroups is essential for ensuring fairness and reliability in real-world applications …

被引用次数：9 相关文章所有 5 个版本

Navigating data-centric artificial intelligence with DC-Check: Advances, challenges, and opportunities

N Seedat, F Imrie… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org

Data-centric AI is an emerging paradigm that emphasizes the critical role of data in real-
world machine learning (ML) systems—as a complement to model development. However …

被引用次数：7 相关文章所有 2 个版本

[PDF] neurips.cc

TRIAGE: Characterizing and auditing training data for improved regression

N Seedat, J Crabbé, Z Qian… - Advances in Neural …, 2024 - proceedings.neurips.cc

Data quality is crucial for robust machine learning algorithms, with the recent interest in data-
centric AI emphasizing the importance of training data characterization. However, current …

被引用次数：5 相关文章所有 5 个版本

[PDF] openreview.net

Query-dependent prompt evaluation and optimization with offline inverse RL

H Sun, A Hüyük, M van der Schaar - The Twelfth International …, 2023 - openreview.net

In this study, we aim to enhance the arithmetic reasoning ability of Large Language Models
(LLMs) through zero-shot prompt optimization. We identify a previously overlooked objective …

被引用次数：16 相关文章所有 3 个版本

[PDF] neurips.cc

Reimagining synthetic tabular data generation through data-centric AI: A comprehensive benchmark

L Hansen, N Seedat… - Advances in Neural …, 2023 - proceedings.neurips.cc

Synthetic data serves as an alternative in training machine learning models, particularly
when real-world data is limited or inaccessible. However, ensuring that synthetic data …

被引用次数：10 相关文章所有 6 个版本

[PDF] arxiv.org

DMLR: Data-centric Machine Learning Research--Past, Present and Future

L Oala, M Maskey, L Bat-Leah, A Parrish… - arXiv preprint arXiv …, 2023 - arxiv.org

Drawing from discussions at the inaugural DMLR workshop at ICML 2023 and meetings
prior, in this report we outline the relevance of community engagement and infrastructure …

被引用次数：8 相关文章所有 4 个版本

[PDF] cell.com Full View

Unified fair federated learning for digital healthcare

F Zhang, Z Shuai, K Kuang, F Wu, Y Zhuang, J Xiao - Patterns, 2024 - cell.com

Federated learning (FL) is a promising approach for healthcare institutions to train high-
quality medical models collaboratively while protecting sensitive data privacy. However, FL …

被引用次数：9 相关文章所有 7 个版本

[PDF] arxiv.org

Curated llm: Synergy of llms and data curation for tabular augmentation in ultra low-data regimes

N Seedat, N Huynh, B van Breugel… - arXiv preprint arXiv …, 2023 - arxiv.org

Machine Learning (ML) in low-data settings remains an underappreciated yet crucial
problem. This challenge is pronounced in low-to-middle income countries where access to …

被引用次数：6 相关文章所有 3 个版本

[PDF] arxiv.org

Dissecting sample hardness: A fine-grained analysis of hardness characterization methods for data-centric AI

N Seedat, F Imrie, M van der Schaar - arXiv preprint arXiv:2403.04551, 2024 - arxiv.org

Characterizing samples that are difficult to learn from is crucial to developing highly
performant ML models. This has led to numerous Hardness Characterization Methods …

被引用次数：4 相关文章所有 3 个版本

[PDF] arxiv.org

Selective Learning: Towards Robust Calibration with Dynamic Regularization

Z Han, Y Yang, C Zhang, L Zhang, JT Zhou… - arXiv preprint arXiv …, 2024 - arxiv.org

Miscalibration in deep learning refers to there is a discrepancy between the predicted
confidence and performance. This problem usually arises due to the overfitting problem …

被引用次数：3 相关文章所有 2 个版本

高级搜索

QQ 群