Sketching as a tool for numerical linear algebra

DP Woodruff - … and Trends® in Theoretical Computer Science, 2014 - nowpublishers.com
This survey highlights the recent advances in algorithms for numerical linear algebra that
have come from the technique of linear sketching, whereby given a matrix, one first …

Turning Big Data Into Tiny Data: Constant-Size Coresets for -Means, PCA, and Projective Clustering

D Feldman, M Schmidt, C Sohler - SIAM Journal on Computing, 2020 - SIAM
We develop and analyze a method to reduce the size of a very large set of data points in a
high-dimensional Euclidean space R^d to a small set of weighted points such that the result …

Sketching data sets for large-scale learning: Keeping only what you need

R Gribonval, A Chatalic, N Keriven… - IEEE Signal …, 2021 - ieeexplore.ieee.org
Big data can be a blessing: with very large training data sets it becomes possible to perform
complex learning tasks with unprecedented accuracy. Yet, this improved performance …

A unified framework for approximating and clustering data

D Feldman, M Langberg - Proceedings of the forty-third annual ACM …, 2011 - dl.acm.org
Given a set F of n positive functions over a ground set X, we consider the problem of
computing x* that minimizes the expression∑ f∈ Ff (x), over x∈ X. A typical application is …

Improved Coresets for Euclidean -Means

V Cohen-Addad, K Green Larsen… - Advances in …, 2022 - proceedings.neurips.cc
Given a set of $ n $ points in $ d $ dimensions, the Euclidean $ k $-means problem (resp.
Euclidean $ k $-median) consists of finding $ k $ centers such that the sum of squared …

Towards optimal lower bounds for k-median and k-means coresets

V Cohen-Addad, KG Larsen, D Saulpic… - Proceedings of the 54th …, 2022 - dl.acm.org
The (k, z)-clustering problem consists of finding a set of k points called centers, such that the
sum of distances raised to the power of z of every data point to its closest center is …

Practical coreset constructions for machine learning

O Bachem, M Lucic, A Krause - arXiv preprint arXiv:1703.06476, 2017 - arxiv.org
We investigate coresets-succinct, small summaries of large data sets-so that solutions found
on the summary are provably competitive with solution found on the full data set. We provide …

Near-optimal column-based matrix reconstruction

C Boutsidis, P Drineas, M Magdon-Ismail - SIAM Journal on Computing, 2014 - SIAM
We consider low-rank reconstruction of a matrix using a subset of its columns and present
asymptotically optimal algorithms for both spectral norm and Frobenius norm reconstruction …

New frameworks for offline and streaming coreset constructions

V Braverman, D Feldman, H Lang, A Statman… - arXiv preprint arXiv …, 2016 - arxiv.org
A coreset for a set of points is a small subset of weighted points that approximately
preserves important properties of the original set. Specifically, if $ P $ is a set of points, $ Q …

Core-sets: Updated survey

D Feldman - Sampling techniques for supervised or unsupervised …, 2020 - Springer
In optimization or machine learning problems we are given a set of items, usually points in
some metric space, and the goal is to minimize or maximize an objective function over some …