Exploring the limits of language modeling

Z Gan, L Li, C Li, L Wang, Z Liu… - Foundations and Trends …, 2022 - nowpublishers.com

This monograph surveys vision-language pre-training (VLP) methods for multimodal
intelligence that have been developed in the last few years. We group these approaches …

被引用次数：164 相关文章所有 7 个版本

[PDF] ieee.org

Contrastive representation learning: A framework and review

PH Le-Khac, G Healy, AF Smeaton - Ieee Access, 2020 - ieeexplore.ieee.org

Contrastive Learning has recently received interest due to its success in self-supervised
representation learning in the computer vision domain. However, the origins of Contrastive …

被引用次数：766 相关文章所有 10 个版本

[HTML] codetds.com

[HTML][HTML] Attention Is All You Need.(Nips), 2017

A Vaswani, N Shazeer, N Parmar, J Uszkoreit… - arXiv preprint arXiv …, 2017 - codetds.com

摘要占主导地位的序列转导模型基于复杂的递归或卷积神经网络, 包括编码器和解码器.
性能最好的模型还通过注意力机制连接编码器和解码器. 我们提出了一种新的简单网络架构 …

被引用次数：1431 相关文章

[PDF] arxiv.org

Llama: Open and efficient foundation language models

H Touvron, T Lavril, G Izacard, X Martinet… - arXiv preprint arXiv …, 2023 - arxiv.org

We introduce LLaMA, a collection of foundation language models ranging from 7B to 65B
parameters. We train our models on trillions of tokens, and show that it is possible to train …

被引用次数：8982 相关文章所有 12 个版本

[PDF] arxiv.org

Starcoder: may the source be with you!

R Li, LB Allal, Y Zi, N Muennighoff, D Kocetkov… - arXiv preprint arXiv …, 2023 - arxiv.org

The BigCode community, an open-scientific collaboration working on the responsible
development of Large Language Models for Code (Code LLMs), introduces StarCoder and …

被引用次数：572 相关文章所有 4 个版本

[PDF] neurips.cc

Flamingo: a visual language model for few-shot learning

JB Alayrac, J Donahue, P Luc… - Advances in neural …, 2022 - proceedings.neurips.cc

Building models that can be rapidly adapted to novel tasks using only a handful of annotated
examples is an open challenge for multimodal machine learning research. We introduce …

被引用次数：2786 相关文章所有 7 个版本

[PDF] arxiv.org

Lamda: Language models for dialog applications

R Thoppilan, D De Freitas, J Hall, N Shazeer… - arXiv preprint arXiv …, 2022 - arxiv.org

We present LaMDA: Language Models for Dialog Applications. LaMDA is a family of
Transformer-based neural language models specialized for dialog, which have up to 137B …

被引用次数：1412 相关文章所有 6 个版本

[PDF] arxiv.org

Using deepspeed and megatron to train megatron-turing nlg 530b, a large-scale generative language model

S Smith, M Patwary, B Norick, P LeGresley… - arXiv preprint arXiv …, 2022 - arxiv.org

Pretrained general-purpose language models can achieve state-of-the-art accuracies in
various natural language processing domains by adapting to downstream tasks via zero …

被引用次数：592 相关文章所有 4 个版本

[PDF] mlr.press

Improving language models by retrieving from trillions of tokens

S Borgeaud, A Mensch, J Hoffmann… - International …, 2022 - proceedings.mlr.press

We enhance auto-regressive language models by conditioning on document chunks
retrieved from a large corpus, based on local similarity with preceding tokens. With a 2 …

被引用次数：878 相关文章所有 5 个版本

[PDF] arxiv.org

Scaling language models: Methods, analysis & insights from training gopher

JW Rae, S Borgeaud, T Cai, K Millican… - arXiv preprint arXiv …, 2021 - arxiv.org

Language modelling provides a step towards intelligent communication systems by
harnessing large repositories of written human knowledge to better predict and understand …

被引用次数：945 相关文章所有 5 个版本

高级搜索

QQ 群