D Ting, E Brochu - Advances in neural information …, 2018 - proceedings.neurips.cc
Subsampling is a common and often effective method to deal with the computational challenges of large datasets. However, for most statistical models, there is no well-motivated …
Z Xie, X Chen - Knowledge-Based Systems, 2022 - Elsevier
Partial least squares (PLS) performs well for high-dimensional regression problems, where the number of predictors can far exceed the number of observations. Similar to many other …
P Ma, X Sun - Wiley Interdisciplinary Reviews: Computational …, 2015 - Wiley Online Library
Rapid advance in science and technology in the past decade brings an extraordinary amount of data, offering researchers an unprecedented opportunity to tackle complex …
S Wu, X Zhu, H Wang - arXiv preprint arXiv:2304.06231, 2023 - arxiv.org
Modern statistical analysis often encounters datasets with large sizes. For these datasets, conventional estimation methods can hardly be used immediately because practitioners …
X Zhang, J Wang, J Yin - Proceedings of the VLDB Endowment, 2016 - dl.acm.org
In this paper, we aim to enable both efficient and accurate approximations on arbitrary sub- datasets of a large dataset. Due to the prohibitive storage overhead of caching offline …
R Zhu, P Ma, MW Mahoney, B Yu - arXiv preprint arXiv:1509.05111, 2015 - arxiv.org
A significant hurdle for analyzing large sample data is the lack of effective statistical computing and inference methods. An emerging powerful approach for analyzing large …
Y Yao, HY Wang - Journal of Data Science, 2021 - airitilibrary.com
Subsampling is an effective way to deal with big data problems and many subsampling approaches have been proposed for different models, such as leverage sampling for linear …
J Kim, C Lee, Y Shin, S Park, M Kim, N Park… - Proceedings of the 28th …, 2022 - dl.acm.org
Score-based generative models (SGMs) are a recent breakthrough in generating fake images. SGMs are known to surpass other generative models, eg, generative adversarial …
SM Xie, S Ermon - arXiv preprint arXiv:1901.10517, 2019 - arxiv.org
Many machine learning tasks require sampling a subset of items from a collection based on a parameterized distribution. The Gumbel-softmax trick can be used to sample a single item …