Sharp global convergence guarantees for iterative nonconvex optimization with random data Page 1 The Annals of Statistics 2023, Vol. 51, No. 1, 179–210 https://doi.org/10.1214/22-AOS2246 …
Top-K sparse softmax gating mixture of experts has been widely used for scaling up massive deep-learning architectures without increasing the computational cost. Despite its popularity …
Transformers with multi-head self-attention have achieved remarkable success in sequence modeling and beyond. However, they suffer from high computational and memory …
SR Kasa, V Rajan - Scientific Reports, 2023 - nature.com
Clustering is a fundamental tool for exploratory data analysis, and is ubiquitous across scientific disciplines. Gaussian Mixture Model (GMM) is a popular probabilistic and …
Q Mai, X Zhang, Y Pan, K Deng - Journal of the American Statistical …, 2022 - Taylor & Francis
Modern scientific studies often collect datasets in the form of tensors. These datasets call for innovative statistical analysis methods. In particular, there is a pressing need for tensor …
We consider solving the low-rank matrix sensing problem with the Factorized Gradient Descent (FGD) method when the specified rank is larger than the true rank. We refer to this …
Y Wu, HH Zhou - Mathematical Statistics and Learning, 2021 - ems.press
We analyze the classical EM algorithm for parameter estimation in the symmetric two- component Gaussian mixtures in d dimensions. We show that, even in the absence of any …
J Kwon, N Ho, C Caramanis - International Conference on …, 2021 - proceedings.mlr.press
We study the convergence rates of the EM algorithm for learning two-component mixed linear regression under all regimes of signal-to-noise ratio (SNR). We resolve a long …
T Manole, N Ho - International Conference on Machine …, 2022 - proceedings.mlr.press
We revisit the classical problem of deriving convergence rates for the maximum likelihood estimator (MLE) in finite mixture models. The Wasserstein distance has become a standard …