A review of distributed statistical inference

Y Gao, W Liu, H Wang, X Wang, Y Yan… - Statistical Theory and …, 2022 - Taylor & Francis
The rapid emergence of massive datasets in various fields poses a serious challenge to
traditional statistical methods. Meanwhile, it provides opportunities for researchers to …

Renewable quantile regression for streaming datasets

K Wang, H Wang, S Li - Knowledge-Based Systems, 2022 - Elsevier
Streaming data analysis has drawn much attention, where large amounts of data arrive in
streams. Because limited memory can only store a small batch of data, fast analysis without …

[图书][B] Statistical foundations of data science

J Fan, R Li, CH Zhang, H Zou - 2020 - taylorfrancis.com
Statistical Foundations of Data Science gives a thorough introduction to commonly used
statistical models, contemporary statistical machine learning techniques and algorithms …

Information-based optimal subdata selection for big data linear regression

HY Wang, M Yang, J Stufken - Journal of the American Statistical …, 2019 - Taylor & Francis
Extraordinary amounts of data are being produced in many branches of science. Proven
statistical methods are no longer applicable with extraordinary large datasets due to …

Distributed Computing and Inference for Big Data

L Zhou, Z Gong, P Xiang - Annual Review of Statistics and Its …, 2023 - annualreviews.org
Data are distributed across different sites due to computing facility limitations or data privacy
considerations. Conventional centralized methods—those in which all datasets are stored …

[HTML][HTML] Distributed estimation of principal eigenspaces

J Fan, D Wang, K Wang, Z Zhu - Annals of statistics, 2019 - ncbi.nlm.nih.gov
Principal component analysis (PCA) is fundamental to statistical machine learning. It extracts
latent principal factors that contribute to the most variation of the data. When data are stored …

Quantile regression under memory constraint

X Chen, W Liu, Y Zhang - 2019 - projecteuclid.org
Quantile regression under memory constraint Page 1 The Annals of Statistics 2019, Vol. 47,
No. 6, 3244–3273 https://doi.org/10.1214/18-AOS1777 © Institute of Mathematical Statistics …

Communication-efficient surrogate quantile regression for non-randomly distributed system

K Wang, B Zhang, F Alenezi, S Li - Information sciences, 2022 - Elsevier
Distributed system has been widely used to solve massive data analysis tasks. This article
targets on quantile regression on distributed system with non-randomly distributed massive …

Optimal subsampling for quantile regression in big data

H Wang, Y Ma - Biometrika, 2021 - academic.oup.com
We investigate optimal subsampling for quantile regression. We derive the asymptotic
distribution of a general subsampling estimator and then derive two versions of optimal …

Inference and uncertainty quantification for noisy matrix completion

Y Chen, J Fan, C Ma, Y Yan - Proceedings of the National …, 2019 - National Acad Sciences
Noisy matrix completion aims at estimating a low-rank matrix given only partial and
corrupted entries. Despite remarkable progress in designing efficient estimation algorithms …