A coreset for a set of points is a small subset of weighted points that approximately preserves important properties of the original set. Specifically, if $ P $ is a set of points, $ Q …
D Woodruff, T Yasuda - International Conference on …, 2023 - proceedings.mlr.press
In large scale machine learning, random sampling is a popular way to approximate datasets by a small representative subset of examples. In particular, sensitivity sampling is an …
DP Woodruff, S Zhou - 2021 IEEE 62nd Annual Symposium on …, 2022 - ieeexplore.ieee.org
In the adversarially robust streaming model, a stream of elements is presented to an algorithm and is allowed to depend on the output of the algorithm at earlier times during the …
Robustness against adversarial attacks has recently been at the forefront of algorithmic design for machine learning tasks. In the adversarial streaming model, an adversary gives …
DP Woodruff, T Yasuda - Proceedings of the 2023 Annual ACM-SIAM …, 2023 - SIAM
The seminal work of Cohen and Peng [CP15](STOC 2015) introduced Lewis weight sampling to the theoretical computer science community, which yields fast row sampling …
DP Woodruff, T Yasuda - Proceedings of the 55th Annual ACM …, 2023 - dl.acm.org
Subset selection for the rank k approximation of an n× d matrix A offers improvements in the interpretability of matrices, as well as a variety of computational savings. This problem is well …
We consider the classic Euclidean k-median and k-means objective on data streams, where the goal is to provide a (1+ε)-approximation to the optimal k-median or k-means solution …
T Mai, C Musco, A Rao - Advances in Neural Information …, 2021 - proceedings.neurips.cc
We give relative error coresets for training linear classifiers with a broad class of loss functions, including the logistic loss and hinge loss. Our construction achieves …
Clustering is an important technique for identifying structural information in large-scale data analysis, where the underlying dataset may be too large to store. In many applications …