Deep learning on a data diet: Finding important examples early in training

M Paul, S Ganguli… - Advances in neural …, 2021 - proceedings.neurips.cc
Recent success in deep learning has partially been driven by training increasingly
overparametrized networks on ever larger datasets. It is therefore natural to ask: how much …

Deepcore: A comprehensive library for coreset selection in deep learning

C Guo, B Zhao, Y Bai - International Conference on Database and Expert …, 2022 - Springer
Coreset selection, which aims to select a subset of the most informative training samples, is
a long-standing learning problem that can benefit many downstream tasks such as data …

Dataset distillation with convexified implicit gradients

N Loo, R Hasani, M Lechner… - … Conference on Machine …, 2023 - proceedings.mlr.press
We propose a new dataset distillation algorithm using reparameterization and
convexification of implicit gradients (RCIG), that substantially improves the state-of-the-art …

Improved Coresets for Euclidean -Means

V Cohen-Addad, K Green Larsen… - Advances in …, 2022 - proceedings.neurips.cc
Given a set of $ n $ points in $ d $ dimensions, the Euclidean $ k $-means problem (resp.
Euclidean $ k $-median) consists of finding $ k $ centers such that the sum of squared …

A new coreset framework for clustering

V Cohen-Addad, D Saulpic… - Proceedings of the 53rd …, 2021 - dl.acm.org
Given a metric space, the (k, z)-clustering problem consists of finding k centers such that the
sum of the of distances raised to the power z of every point to its closest center is minimized …

Towards optimal lower bounds for k-median and k-means coresets

V Cohen-Addad, KG Larsen, D Saulpic… - Proceedings of the 54th …, 2022 - dl.acm.org
The (k, z)-clustering problem consists of finding a set of k points called centers, such that the
sum of distances raised to the power of z of every data point to its closest center is …

The power of uniform sampling for coresets

V Braverman, V Cohen-Addad… - 2022 IEEE 63rd …, 2022 - ieeexplore.ieee.org
Motivated by practical generalizations of the classic k-median and k-means objectives, such
as clustering with size constraints, fair clustering, and Wasserstein barycenter, we introduce …

New frameworks for offline and streaming coreset constructions

V Braverman, D Feldman, H Lang, A Statman… - arXiv preprint arXiv …, 2016 - arxiv.org
A coreset for a set of points is a small subset of weighted points that approximately
preserves important properties of the original set. Specifically, if $ P $ is a set of points, $ Q …

Sharper Bounds for Sensitivity Sampling

D Woodruff, T Yasuda - International Conference on …, 2023 - proceedings.mlr.press
In large scale machine learning, random sampling is a popular way to approximate datasets
by a small representative subset of examples. In particular, sensitivity sampling is an …

Compressing neural networks: Towards determining the optimal layer-wise decomposition

L Liebenwein, A Maalouf… - Advances in Neural …, 2021 - proceedings.neurips.cc
We present a novel global compression framework for deep neural networks that
automatically analyzes each layer to identify the optimal per-layer compression ratio, while …