T Yue, L Guo, J Cheng, X Gao, J Liu - arXiv preprint arXiv:2410.10456, 2024 - arxiv.org
In the era of Large Language Models (LLMs), Mixture-of-Experts (MoE) architectures offer a
promising approach to managing computational costs while scaling up model parameters …