Demystifying softmax gating function in Gaussian mixture of experts

H Nguyen, TT Nguyen, N Ho - Advances in Neural …, 2023 - proceedings.neurips.cc
Understanding the parameter estimation of softmax gating Gaussian mixture of experts has
remained a long-standing open problem in the literature. It is mainly due to three …

Estimating the number of components in finite mixture models via the Group-Sort-Fuse procedure

T Manole, A Khalili - The Annals of Statistics, 2021 - projecteuclid.org
Estimating the number of components in finite mixture models via the Group-Sort-Fuse
procedure Page 1 The Annals of Statistics 2021, Vol. 49, No. 6, 3043–3069 https://doi.org/10.1214/21-AOS2072 …

Projection robust Wasserstein distance and Riemannian optimization

T Lin, C Fan, N Ho, M Cuturi… - Advances in neural …, 2020 - proceedings.neurips.cc
Projection robust Wasserstein (PRW) distance, or Wasserstein projection pursuit (WPP), is a
robust variant of the Wasserstein distance. Recent work suggests that this quantity is more …

A riemannian block coordinate descent method for computing the projection robust wasserstein distance

M Huang, S Ma, L Lai - International Conference on …, 2021 - proceedings.mlr.press
The Wasserstein distance has become increasingly important in machine learning and deep
learning. Despite its popularity, the Wasserstein distance is hard to approximate because of …

Fusemoe: Mixture-of-experts transformers for fleximodal fusion

X Han, H Nguyen, C Harris, N Ho, S Saria - arXiv preprint arXiv …, 2024 - arxiv.org
As machine learning models in critical fields increasingly grapple with multimodal data, they
face the dual challenges of handling a wide array of modalities, often incomplete due to …

Strong identifiability and optimal minimax rates for finite mixture estimation

P Heinrich, J Kahn - 2018 - projecteuclid.org
Strong identifiability and optimal minimax rates for finite mixture estimation Page 1 The Annals
of Statistics 2018, Vol. 46, No. 6A, 2844–2870 https://doi.org/10.1214/17-AOS1641 © Institute …

Towards convergence rates for parameter estimation in Gaussian-gated mixture of experts

H Nguyen, TT Nguyen, K Nguyen… - … Conference on Artificial …, 2024 - proceedings.mlr.press
Originally introduced as a neural network for ensemble learning, mixture of experts (MoE)
has recently become a fundamental building block of highly successful modern deep neural …

Randomly initialized EM algorithm for two-component Gaussian mixture achieves near optimality in iterations

Y Wu, HH Zhou - Mathematical Statistics and Learning, 2021 - ems.press
We analyze the classical EM algorithm for parameter estimation in the symmetric two-
component Gaussian mixtures in d dimensions. We show that, even in the absence of any …

On the minimax optimality of the EM algorithm for learning two-component mixed linear regression

J Kwon, N Ho, C Caramanis - International Conference on …, 2021 - proceedings.mlr.press
We study the convergence rates of the EM algorithm for learning two-component mixed
linear regression under all regimes of signal-to-noise ratio (SNR). We resolve a long …

Is Temperature Sample Efficient for Softmax Gaussian Mixture of Experts?

H Nguyen, P Akbarian, N Ho - arXiv preprint arXiv:2401.13875, 2024 - arxiv.org
Dense-to-sparse gating mixture of experts (MoE) has recently become an effective
alternative to a well-known sparse MoE. Rather than fixing the number of activated experts …