Data distillation: A survey

N Sachdeva, J McAuley - arXiv preprint arXiv:2301.04272, 2023 - arxiv.org
The popularity of deep learning has led to the curation of a vast number of massive and
multifarious datasets. Despite having close-to-human performance on individual tasks …

Repeated random sampling for minimizing the time-to-accuracy of learning

P Okanovic, R Waleffe, V Mageirakos… - arXiv preprint arXiv …, 2023 - arxiv.org
Methods for carefully selecting or generating a small set of training data to learn from, ie,
data pruning, coreset selection, and data distillation, have been shown to be effective in …

Explaining decision structures and data value for neural networks in crop yield prediction

M von Bloh, B Seiler, P van der Smagt… - Environmental …, 2024 - iopscience.iop.org
Neural networks are powerful machine learning models, but their reliability and trust are
often criticized due to the unclear nature of their internal learned relationships. We explored …

All models are wrong, some are useful: Model Selection with Limited Labels

P Okanovic, A Kirsch, J Kasper, T Hoefler… - arXiv preprint arXiv …, 2024 - arxiv.org
We introduce MODEL SELECTOR, a framework for label-efficient selection of pretrained
classifiers. Given a pool of unlabeled target data, MODEL SELECTOR samples a small …

Robust Data Pruning: Uncovering and Overcoming Implicit Bias

A Vysogorets, K Ahuja, J Kempe - arXiv preprint arXiv:2404.05579, 2024 - arxiv.org
In the era of exceptionally data-hungry models, careful selection of the training data is
essential to mitigate the extensive costs of deep learning. Data pruning offers a solution by …

Towards Data-efficient Machine Learning Systems

N Sachdeva - 2024 - search.proquest.com
The amount of data available to train modern machine learning systems has been
increasing rapidly, so much so that we're using, eg, entirety of the publicly available text data …

[PDF][PDF] Repeated Random Sampling for Data Efficient Learning

P Okanovic - 2023 - research-collection.ethz.ch
Deep learning has shown great success in several areas, including speech recognition,
natural language processing, and computer vision, but its effectiveness significantly …