Unilmv2: Pseudo-masked language models for unified language model pre-training

Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing

P Liu, W Yuan, J Fu, Z Jiang, H Hayashi… - ACM Computing …, 2023 - dl.acm.org

This article surveys and organizes research works in a new paradigm in natural language
processing, which we dub “prompt-based learning.” Unlike traditional supervised learning …

被引用次数：1393 相关文章所有 5 个版本

[PDF] arxiv.org

Deep learning--based text classification: a comprehensive review

S Minaee, N Kalchbrenner, E Cambria… - ACM computing …, 2021 - dl.acm.org

Deep learning--based models have surpassed classical machine learning--based
approaches in various text classification tasks, including sentiment analysis, news …

被引用次数：994 相关文章所有 10 个版本

[PDF] thecvf.com

Swin transformer v2: Scaling up capacity and resolution

Z Liu, H Hu, Y Lin, Z Yao, Z Xie, Y Wei… - Proceedings of the …, 2022 - openaccess.thecvf.com

We present techniques for scaling Swin Transformer [??] up to 3 billion parameters and
making it capable of training with images of up to 1,536 x1, 536 resolution. By scaling up …

被引用次数：525 相关文章所有 7 个版本

[PDF] arxiv.org

Beit: Bert pre-training of image transformers

H Bao, L Dong, S Piao, F Wei - arXiv preprint arXiv:2106.08254, 2021 - arxiv.org

We introduce a self-supervised vision representation model BEiT, which stands for
Bidirectional Encoder representation from Image Transformers. Following BERT developed …

被引用次数：1253 相关文章所有 3 个版本

[PDF] thecvf.com

Video swin transformer

Z Liu, J Ning, Y Cao, Y Wei, Z Zhang… - Proceedings of the …, 2022 - openaccess.thecvf.com

The vision community is witnessing a modeling shift from CNNs to Transformers, where pure
Transformer architectures have attained top accuracy on the major video recognition …

被引用次数：764 相关文章所有 9 个版本

[PDF] thecvf.com

Flava: A foundational language and vision alignment model

A Singh, R Hu, V Goswami… - Proceedings of the …, 2022 - openaccess.thecvf.com

State-of-the-art vision and vision-and-language models rely on large-scale visio-linguistic
pretraining for obtaining good performance on a variety of downstream tasks. Generally …

被引用次数：219 相关文章所有 7 个版本

[PDF] neurips.cc

Bartscore: Evaluating generated text as text generation

W Yuan, G Neubig, P Liu - Advances in Neural Information …, 2021 - proceedings.neurips.cc

A wide variety of NLP applications, such as machine translation, summarization, and dialog,
involve text generation. One major challenge for these applications is how to evaluate …

被引用次数：266 相关文章所有 6 个版本

[PDF] thecvf.com

Swin transformer: Hierarchical vision transformer using shifted windows

Z Liu, Y Lin, Y Cao, H Hu, Y Wei… - Proceedings of the …, 2021 - openaccess.thecvf.com

This paper presents a new vision Transformer, called Swin Transformer, that capably serves
as a general-purpose backbone for computer vision. Challenges in adapting Transformer …

被引用次数：8493 相关文章所有 10 个版本

[PDF] neurips.cc

Vlmo: Unified vision-language pre-training with mixture-of-modality-experts

H Bao, W Wang, L Dong, Q Liu… - Advances in …, 2022 - proceedings.neurips.cc

We present a unified Vision-Language pretrained Model (VLMo) that jointly learns a dual
encoder and a fusion encoder with a modular Transformer network. Specifically, we …

被引用次数：146 相关文章所有 5 个版本

[PDF] thecvf.com

Multi-scale vision longformer: A new vision transformer for high-resolution image encoding

P Zhang, X Dai, J Yang, B Xiao… - Proceedings of the …, 2021 - openaccess.thecvf.com

This paper presents a new Vision Transformer (ViT) architecture Multi-Scale Vision
Longformer, which significantly enhances the ViT of [??] for encoding high-resolution …

被引用次数：209 相关文章所有 6 个版本

高级搜索

QQ 群