Less is better: Unweighted data subsampling via influence function

Z Wang, H Zhu, Z Dong, X He, SL Huang - Proceedings of the AAAI …, 2020 - aaai.org
In the time of Big Data, training complex models on large-scale data sets is challenging,
making it appealing to reduce data volume for saving computation resources by …

Model-free subsampling method based on uniform designs

M Zhang, Y Zhou, Z Zhou… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
Subsampling or subdata selection is a useful approach in large-scale statistical learning.
Most existing studies focus on model-based subsampling methods which significantly …

Optimal subsampling with influence functions

D Ting, E Brochu - Advances in neural information …, 2018 - proceedings.neurips.cc
Subsampling is a common and often effective method to deal with the computational
challenges of large datasets. However, for most statistical models, there is no well-motivated …

Subsampling for partial least-squares regression via an influence function

Z Xie, X Chen - Knowledge-Based Systems, 2022 - Elsevier
Partial least squares (PLS) performs well for high-dimensional regression problems, where
the number of predictors can far exceed the number of observations. Similar to many other …

Optimal subsampling via predictive inference

X Wu, Y Huo, H Ren, C Zou - Journal of the American Statistical …, 2024 - Taylor & Francis
In the big data era, subsampling or sub-data selection techniques are often adopted to
extract a fraction of informative individuals from the massive data. Existing subsampling …

Leveraging for big data regression

P Ma, X Sun - Wiley Interdisciplinary Reviews: Computational …, 2015 - Wiley Online Library
Rapid advance in science and technology in the past decade brings an extraordinary
amount of data, offering researchers an unprecedented opportunity to tackle complex …

Sapprox: Enabling efficient and accurate approximations on sub-datasets with distribution-aware online sampling

X Zhang, J Wang, J Yin - Proceedings of the VLDB Endowment, 2016 - dl.acm.org
In this paper, we aim to enable both efficient and accurate approximations on arbitrary sub-
datasets of a large dataset. Due to the prohibitive storage overhead of caching offline …

Optimal subsampling approaches for large sample linear regression

R Zhu, P Ma, MW Mahoney, B Yu - arXiv preprint arXiv:1509.05111, 2015 - arxiv.org
A significant hurdle for analyzing large sample data is the lack of effective statistical
computing and inference methods. An emerging powerful approach for analyzing large …

A review on optimal subsampling methods for massive datasets

Y Yao, HY Wang - Journal of Data Science, 2021 - airitilibrary.com
Subsampling is an effective way to deal with big data problems and many subsampling
approaches have been proposed for different models, such as leverage sampling for linear …

Reparameterizable subset sampling via continuous relaxations

SM Xie, S Ermon - arXiv preprint arXiv:1901.10517, 2019 - arxiv.org
Many machine learning tasks require sampling a subset of items from a collection based on
a parameterized distribution. The Gumbel-softmax trick can be used to sample a single item …