Demystifying softmax gating function in Gaussian mixture of experts

H Nguyen, TT Nguyen, N Ho - Advances in Neural …, 2023 - proceedings.neurips.cc
Understanding the parameter estimation of softmax gating Gaussian mixture of experts has
remained a long-standing open problem in the literature. It is mainly due to three …

Is infinity that far? A Bayesian nonparametric perspective of finite mixture models

R Argiento, M De Iorio - The Annals of Statistics, 2022 - projecteuclid.org
Is infinity that far? A Bayesian nonparametric perspective of finite mixture models Page 1 The
Annals of Statistics 2022, Vol. 50, No. 5, 2641–2663 https://doi.org/10.1214/22-AOS2201 © …

Selective inference for k-means clustering

YT Chen, DM Witten - Journal of Machine Learning Research, 2023 - jmlr.org
We consider the problem of testing for a difference in means between clusters of
observations identified via k-means clustering. In this setting, classical hypothesis tests lead …

Estimating the number of components in finite mixture models via the Group-Sort-Fuse procedure

T Manole, A Khalili - The Annals of Statistics, 2021 - projecteuclid.org
Estimating the number of components in finite mixture models via the Group-Sort-Fuse
procedure Page 1 The Annals of Statistics 2021, Vol. 49, No. 6, 3043–3069 https://doi.org/10.1214/21-AOS2072 …

Statistical perspective of top-k sparse softmax gating mixture of experts

H Nguyen, P Akbarian, F Yan, N Ho - arXiv preprint arXiv:2309.13850, 2023 - arxiv.org
Top-K sparse softmax gating mixture of experts has been widely used for scaling up massive
deep-learning architectures without increasing the computational cost. Despite its popularity …

Towards convergence rates for parameter estimation in Gaussian-gated mixture of experts

H Nguyen, TT Nguyen, K Nguyen… - … Conference on Artificial …, 2024 - proceedings.mlr.press
Originally introduced as a neural network for ensemble learning, mixture of experts (MoE)
has recently become a fundamental building block of highly successful modern deep neural …

Refined convergence rates for maximum likelihood estimation under finite mixture models

T Manole, N Ho - International Conference on Machine …, 2022 - proceedings.mlr.press
We revisit the classical problem of deriving convergence rates for the maximum likelihood
estimator (MLE) in finite mixture models. The Wasserstein distance has become a standard …

Finite mixture models do not reliably learn the number of components

D Cai, T Campbell, T Broderick - … Conference on Machine …, 2021 - proceedings.mlr.press
Scientists and engineers are often interested in learning the number of subpopulations (or
components) present in a data set. A common suggestion is to use a finite mixture model …

Task-agnostic online reinforcement learning with an infinite mixture of gaussian processes

M Xu, W Ding, J Zhu, Z Liu, B Chen… - Advances in Neural …, 2020 - proceedings.neurips.cc
Continuously learning to solve unseen tasks with limited experience has been extensively
pursued in meta-learning and continual learning, but with restricted assumptions such as …

On excess mass behavior in Gaussian mixture models with Orlicz-Wasserstein distances

A Guha, N Ho, XL Nguyen - International Conference on …, 2023 - proceedings.mlr.press
Dirichlet Process mixture models (DPMM) in combination with Gaussian kernels have been
an important modeling tool for numerous data domains arising from biological, physical, and …