Multilinear Mixture of Experts: Scalable Expert Specialization through Factorization

文章

学术资源搜索

获得 4 条结果（用时0.02秒）

我的图书馆

Multilinear Mixture of Experts: Scalable Expert Specialization through Factorization

在引用文章中搜索

[PDF] arxiv.org

Patentgpt: A large language model for intellectual property

Z Bai, R Zhang, L Chen, Q Cai, Y Zhong… - arXiv preprint arXiv …, 2024 - arxiv.org

In recent years, large language models (LLMs) have attracted significant attention due to
their exceptional performance across a multitude of natural language process tasks, and …

被引用次数：5 相关文章所有 2 个版本

[PDF] arxiv.org

Faster Language Models with Better Multi-Token Prediction Using Tensor Decomposition

A Basharin, A Chertkov, I Oseledets - arXiv preprint arXiv:2410.17765, 2024 - arxiv.org

We propose a new model for multi-token prediction in transformers, aiming to enhance
sampling efficiency without compromising accuracy. Motivated by recent work that predicts …

被引用次数：2 相关文章所有 2 个版本

[PDF] arxiv.org

Monet: Mixture of Monosemantic Experts for Transformers

J Park, YJ Ahn, KE Kim, J Kang - arXiv preprint arXiv:2412.04139, 2024 - arxiv.org

Understanding the internal computations of large language models (LLMs) is crucial for
aligning them with human values and preventing undesirable behaviors like toxic content …

Architecture Design: From Neural Networks to Foundation Models

G Chrysos - 2024 IEEE 11th International Conference on Data …, 2024 - ieeexplore.ieee.org

Historically, we are taught to use task-dependent architecture design and objectives to
tackle data science tasks. Counter intuitively, this dogma has been proven (partly) wrong by …

高级搜索

QQ 群

Multilinear Mixture of Experts: Scalable Expert Specialization through Factorization

Patentgpt: A large language model for intellectual property

Faster Language Models with Better Multi-Token Prediction Using Tensor Decomposition

Monet: Mixture of Monosemantic Experts for Transformers

Architecture Design: From Neural Networks to Foundation Models

引用