Scaling up models and data with t5x and seqio

W Fedus, J Dean, B Zoph - arXiv preprint arXiv:2209.01667, 2022 - arxiv.org

Sparse expert models are a thirty-year old concept re-emerging as a popular architecture in
deep learning. This class of architecture encompasses Mixture-of-Experts, Switch …

被引用次数：91 相关文章所有 2 个版本

[PDF] jmlr.org

Scaling instruction-finetuned language models

HW Chung, L Hou, S Longpre, B Zoph, Y Tay… - Journal of Machine …, 2024 - jmlr.org

Finetuning language models on a collection of datasets phrased as instructions has been
shown to improve model performance and generalization to unseen tasks. In this paper we …

被引用次数：2140 相关文章所有 3 个版本

[PDF] thecvf.com

Scaling language-image pre-training via masking

Y Li, H Fan, R Hu… - Proceedings of the …, 2023 - openaccess.thecvf.com

Abstract We present Fast Language-Image Pre-training (FLIP), a simple and more efficient
method for training CLIP. Our method randomly masks out and removes a large portion of …

被引用次数：212 相关文章所有 6 个版本

[PDF] arxiv.org

Crosslingual generalization through multitask finetuning

N Muennighoff, T Wang, L Sutawika, A Roberts… - arXiv preprint arXiv …, 2022 - arxiv.org

Multitask prompted finetuning (MTF) has been shown to help large language models
generalize to new tasks in a zero-shot setting, but so far explorations of MTF have focused …

被引用次数：470 相关文章所有 5 个版本

[PDF] arxiv.org

Pali: A jointly-scaled multilingual language-image model

X Chen, X Wang, S Changpinyo… - arXiv preprint arXiv …, 2022 - arxiv.org

Effective scaling and a flexible task interface enable large language models to excel at many
tasks. We present PaLI (Pathways Language and Image model), a model that extends this …

被引用次数：480 相关文章所有 6 个版本

[PDF] arxiv.org

Audiolm: a language modeling approach to audio generation

Z Borsos, R Marinier, D Vincent… - … ACM transactions on …, 2023 - ieeexplore.ieee.org

We introduce AudioLM, a framework for high-quality audio generation with long-term
consistency. AudioLM maps the input audio to a sequence of discrete tokens and casts …

被引用次数：367 相关文章所有 5 个版本

[PDF] neurips.cc

Paraphrasing evades detectors of ai-generated text, but retrieval is an effective defense

K Krishna, Y Song, M Karpinska… - Advances in Neural …, 2024 - proceedings.neurips.cc

The rise in malicious usage of large language models, such as fake content creation and
academic plagiarism, has motivated the development of approaches that identify AI …

被引用次数：148 相关文章所有 5 个版本

[PDF] mlsys.org

Efficiently scaling transformer inference

R Pope, S Douglas, A Chowdhery… - Proceedings of …, 2023 - proceedings.mlsys.org

We study the problem of efficient generative inference for Transformer models, in one of its
most challenging settings: large deep models, with tight latency targets and long sequence …

被引用次数：191 相关文章所有 4 个版本

[PDF] jmlr.org

Palm: Scaling language modeling with pathways

A Chowdhery, S Narang, J Devlin, M Bosma… - Journal of Machine …, 2023 - jmlr.org

Large language models have been shown to achieve remarkable performance across a
variety of natural language tasks using few-shot learning, which drastically reduces the …

被引用次数：4016 相关文章所有 10 个版本

[PDF] openreview.net

Phenaki: Variable length video generation from open domain textual descriptions

R Villegas, M Babaeizadeh, PJ Kindermans… - International …, 2022 - openreview.net

We present Phenaki, a model capable of realistic video synthesis given a sequence of
textual prompts. Generating videos from text is particularly challenging due to the …

被引用次数：260 相关文章所有 5 个版本

高级搜索

QQ 群