M Zhang, Y Zhou, Z Zhou… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
Subsampling or subdata selection is a useful approach in large-scale statistical learning. Most existing studies focus on model-based subsampling methods which significantly …
D Ting, E Brochu - Advances in neural information …, 2018 - proceedings.neurips.cc
Subsampling is a common and often effective method to deal with the computational challenges of large datasets. However, for most statistical models, there is no well-motivated …
Z Xie, X Chen - Knowledge-Based Systems, 2022 - Elsevier
Partial least squares (PLS) performs well for high-dimensional regression problems, where the number of predictors can far exceed the number of observations. Similar to many other …
X Wu, Y Huo, H Ren, C Zou - Journal of the American Statistical …, 2024 - Taylor & Francis
In the big data era, subsampling or sub-data selection techniques are often adopted to extract a fraction of informative individuals from the massive data. Existing subsampling …
P Ma, X Sun - Wiley Interdisciplinary Reviews: Computational …, 2015 - Wiley Online Library
Rapid advance in science and technology in the past decade brings an extraordinary amount of data, offering researchers an unprecedented opportunity to tackle complex …
X Zhang, J Wang, J Yin - Proceedings of the VLDB Endowment, 2016 - dl.acm.org
In this paper, we aim to enable both efficient and accurate approximations on arbitrary sub- datasets of a large dataset. Due to the prohibitive storage overhead of caching offline …
R Zhu, P Ma, MW Mahoney, B Yu - arXiv preprint arXiv:1509.05111, 2015 - arxiv.org
A significant hurdle for analyzing large sample data is the lack of effective statistical computing and inference methods. An emerging powerful approach for analyzing large …
Y Yao, HY Wang - Journal of Data Science, 2021 - airitilibrary.com
Subsampling is an effective way to deal with big data problems and many subsampling approaches have been proposed for different models, such as leverage sampling for linear …
SM Xie, S Ermon - arXiv preprint arXiv:1901.10517, 2019 - arxiv.org
Many machine learning tasks require sampling a subset of items from a collection based on a parameterized distribution. The Gumbel-softmax trick can be used to sample a single item …