Towards continual reinforcement learning: A review and perspectives

K Khetarpal, M Riemer, I Rish, D Precup - Journal of Artificial Intelligence …, 2022 - jair.org
In this article, we aim to provide a literature review of different formulations and approaches
to continual reinforcement learning (RL), also known as lifelong or non-stationary RL. We …

A review of sparse expert models in deep learning

W Fedus, J Dean, B Zoph - arXiv preprint arXiv:2209.01667, 2022 - arxiv.org
Sparse expert models are a thirty-year old concept re-emerging as a popular architecture in
deep learning. This class of architecture encompasses Mixture-of-Experts, Switch …

Scaling vision with sparse mixture of experts

C Riquelme, J Puigcerver, B Mustafa… - Advances in …, 2021 - proceedings.neurips.cc
Abstract Sparsely-gated Mixture of Experts networks (MoEs) have demonstrated excellent
scalability in Natural Language Processing. In Computer Vision, however, almost all …

Switch transformers: Scaling to trillion parameter models with simple and efficient sparsity

W Fedus, B Zoph, N Shazeer - Journal of Machine Learning Research, 2022 - jmlr.org
In deep learning, models typically reuse the same parameters for all inputs. Mixture of
Experts (MoE) models defy this and instead select different parameters for each incoming …

Multi-task learning with deep neural networks: A survey

M Crawshaw - arXiv preprint arXiv:2009.09796, 2020 - arxiv.org
Multi-task learning (MTL) is a subfield of machine learning in which multiple tasks are
simultaneously learned by a shared model. Such approaches offer advantages like …

Dselect-k: Differentiable selection in the mixture of experts with applications to multi-task learning

H Hazimeh, Z Zhao, A Chowdhery… - Advances in …, 2021 - proceedings.neurips.cc
Abstract The Mixture-of-Experts (MoE) architecture is showing promising results in improving
parameter sharing in multi-task learning (MTL) and in scaling high-capacity neural networks …

Unified scaling laws for routed language models

A Clark, D de Las Casas, A Guy… - International …, 2022 - proceedings.mlr.press
The performance of a language model has been shown to be effectively modeled as a
power-law in its parameter count. Here we study the scaling behaviors of Routing Networks …

Patch-level routing in mixture-of-experts is provably sample-efficient for convolutional neural networks

MNR Chowdhury, S Zhang, M Wang… - International …, 2023 - proceedings.mlr.press
In deep learning, mixture-of-experts (MoE) activates one or few experts (sub-networks) on a
per-sample or per-token basis, resulting in significant computation reduction. The recently …

Self-routing capsule networks

T Hahn, M Pyeon, G Kim - Advances in neural information …, 2019 - proceedings.neurips.cc
Capsule networks have recently gained a great deal of interest as a new architecture of
neural networks that can be more robust to input perturbations than similar-sized CNNs …

Mixed signals: Sign language production via a mixture of motion primitives

B Saunders, NC Camgoz… - Proceedings of the IEEE …, 2021 - openaccess.thecvf.com
It is common practice to represent spoken languages at their phonetic level. However, for
sign languages, this implies breaking motion into its constituent motion primitives. Avatar …