Demystifying softmax gating function in Gaussian mixture of experts

H Nguyen, TT Nguyen, N Ho - Advances in Neural …, 2023 - proceedings.neurips.cc
Understanding the parameter estimation of softmax gating Gaussian mixture of experts has
remained a long-standing open problem in the literature. It is mainly due to three …

Sharp global convergence guarantees for iterative nonconvex optimization with random data

KA Chandrasekher, A Pananjady… - The Annals of …, 2023 - projecteuclid.org
Sharp global convergence guarantees for iterative nonconvex optimization with random data
Page 1 The Annals of Statistics 2023, Vol. 51, No. 1, 179–210 https://doi.org/10.1214/22-AOS2246 …

Statistical perspective of top-k sparse softmax gating mixture of experts

H Nguyen, P Akbarian, F Yan, N Ho - arXiv preprint arXiv:2309.13850, 2023 - arxiv.org
Top-K sparse softmax gating mixture of experts has been widely used for scaling up massive
deep-learning architectures without increasing the computational cost. Despite its popularity …

Improving transformer with an admixture of attention heads

T Nguyen, T Nguyen, H Do, K Nguyen… - Advances in neural …, 2022 - proceedings.neurips.cc
Transformers with multi-head self-attention have achieved remarkable success in sequence
modeling and beyond. However, they suffer from high computational and memory …

Avoiding inferior clusterings with misspecified Gaussian mixture models

SR Kasa, V Rajan - Scientific Reports, 2023 - nature.com
Clustering is a fundamental tool for exploratory data analysis, and is ubiquitous across
scientific disciplines. Gaussian Mixture Model (GMM) is a popular probabilistic and …

A doubly enhanced em algorithm for model-based tensor clustering

Q Mai, X Zhang, Y Pan, K Deng - Journal of the American Statistical …, 2022 - Taylor & Francis
Modern scientific studies often collect datasets in the form of tensors. These datasets call for
innovative statistical analysis methods. In particular, there is a pressing need for tensor …

On the computational and statistical complexity of over-parameterized matrix sensing

J Zhuo, J Kwon, N Ho, C Caramanis - Journal of Machine Learning …, 2024 - jmlr.org
We consider solving the low-rank matrix sensing problem with the Factorized Gradient
Descent (FGD) method when the specified rank is larger than the true rank. We refer to this …

Randomly initialized EM algorithm for two-component Gaussian mixture achieves near optimality in iterations

Y Wu, HH Zhou - Mathematical Statistics and Learning, 2021 - ems.press
We analyze the classical EM algorithm for parameter estimation in the symmetric two-
component Gaussian mixtures in d dimensions. We show that, even in the absence of any …

On the minimax optimality of the EM algorithm for learning two-component mixed linear regression

J Kwon, N Ho, C Caramanis - International Conference on …, 2021 - proceedings.mlr.press
We study the convergence rates of the EM algorithm for learning two-component mixed
linear regression under all regimes of signal-to-noise ratio (SNR). We resolve a long …

Refined convergence rates for maximum likelihood estimation under finite mixture models

T Manole, N Ho - International Conference on Machine …, 2022 - proceedings.mlr.press
We revisit the classical problem of deriving convergence rates for the maximum likelihood
estimator (MLE) in finite mixture models. The Wasserstein distance has become a standard …