Diversity and depth in per-example routing models

K Khetarpal, M Riemer, I Rish, D Precup - Journal of Artificial Intelligence …, 2022 - jair.org

In this article, we aim to provide a literature review of different formulations and approaches
to continual reinforcement learning (RL), also known as lifelong or non-stationary RL. We …

被引用次数：265 相关文章所有 9 个版本

[PDF] arxiv.org

A review of sparse expert models in deep learning

W Fedus, J Dean, B Zoph - arXiv preprint arXiv:2209.01667, 2022 - arxiv.org

Sparse expert models are a thirty-year old concept re-emerging as a popular architecture in
deep learning. This class of architecture encompasses Mixture-of-Experts, Switch …

被引用次数：92 相关文章所有 2 个版本

[PDF] neurips.cc

Scaling vision with sparse mixture of experts

C Riquelme, J Puigcerver, B Mustafa… - Advances in …, 2021 - proceedings.neurips.cc

Abstract Sparsely-gated Mixture of Experts networks (MoEs) have demonstrated excellent
scalability in Natural Language Processing. In Computer Vision, however, almost all …

被引用次数：417 相关文章所有 8 个版本

[PDF] jmlr.org

Switch transformers: Scaling to trillion parameter models with simple and efficient sparsity

W Fedus, B Zoph, N Shazeer - Journal of Machine Learning Research, 2022 - jmlr.org

In deep learning, models typically reuse the same parameters for all inputs. Mixture of
Experts (MoE) models defy this and instead select different parameters for each incoming …

被引用次数：1525 相关文章所有 4 个版本

[PDF] arxiv.org

Multi-task learning with deep neural networks: A survey

M Crawshaw - arXiv preprint arXiv:2009.09796, 2020 - arxiv.org

Multi-task learning (MTL) is a subfield of machine learning in which multiple tasks are
simultaneously learned by a shared model. Such approaches offer advantages like …

被引用次数：706 相关文章所有 2 个版本

[PDF] neurips.cc

Dselect-k: Differentiable selection in the mixture of experts with applications to multi-task learning

H Hazimeh, Z Zhao, A Chowdhery… - Advances in …, 2021 - proceedings.neurips.cc

Abstract The Mixture-of-Experts (MoE) architecture is showing promising results in improving
parameter sharing in multi-task learning (MTL) and in scaling high-capacity neural networks …

被引用次数：108 相关文章所有 10 个版本

[PDF] mlr.press

Unified scaling laws for routed language models

A Clark, D de Las Casas, A Guy… - International …, 2022 - proceedings.mlr.press

The performance of a language model has been shown to be effectively modeled as a
power-law in its parameter count. Here we study the scaling behaviors of Routing Networks …

被引用次数：44 相关文章所有 3 个版本

[PDF] mlr.press

Patch-level routing in mixture-of-experts is provably sample-efficient for convolutional neural networks

MNR Chowdhury, S Zhang, M Wang… - International …, 2023 - proceedings.mlr.press

In deep learning, mixture-of-experts (MoE) activates one or few experts (sub-networks) on a
per-sample or per-token basis, resulting in significant computation reduction. The recently …

被引用次数：12 相关文章所有 11 个版本

[PDF] neurips.cc

Self-routing capsule networks

T Hahn, M Pyeon, G Kim - Advances in neural information …, 2019 - proceedings.neurips.cc

Capsule networks have recently gained a great deal of interest as a new architecture of
neural networks that can be more robust to input perturbations than similar-sized CNNs …

被引用次数：115 相关文章所有 6 个版本

[PDF] thecvf.com

Mixed signals: Sign language production via a mixture of motion primitives

B Saunders, NC Camgoz… - Proceedings of the IEEE …, 2021 - openaccess.thecvf.com

It is common practice to represent spoken languages at their phonetic level. However, for
sign languages, this implies breaking motion into its constituent motion primitives. Avatar …

被引用次数：42 相关文章所有 10 个版本

高级搜索

QQ 群