Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing

P Liu, W Yuan, J Fu, Z Jiang, H Hayashi… - ACM Computing …, 2023 - dl.acm.org
This article surveys and organizes research works in a new paradigm in natural language
processing, which we dub “prompt-based learning.” Unlike traditional supervised learning …

Deep learning--based text classification: a comprehensive review

S Minaee, N Kalchbrenner, E Cambria… - ACM computing …, 2021 - dl.acm.org
Deep learning--based models have surpassed classical machine learning--based
approaches in various text classification tasks, including sentiment analysis, news …

Swin transformer v2: Scaling up capacity and resolution

Z Liu, H Hu, Y Lin, Z Yao, Z Xie, Y Wei… - Proceedings of the …, 2022 - openaccess.thecvf.com
We present techniques for scaling Swin Transformer [??] up to 3 billion parameters and
making it capable of training with images of up to 1,536 x1, 536 resolution. By scaling up …

Beit: Bert pre-training of image transformers

H Bao, L Dong, S Piao, F Wei - arXiv preprint arXiv:2106.08254, 2021 - arxiv.org
We introduce a self-supervised vision representation model BEiT, which stands for
Bidirectional Encoder representation from Image Transformers. Following BERT developed …

Video swin transformer

Z Liu, J Ning, Y Cao, Y Wei, Z Zhang… - Proceedings of the …, 2022 - openaccess.thecvf.com
The vision community is witnessing a modeling shift from CNNs to Transformers, where pure
Transformer architectures have attained top accuracy on the major video recognition …

Flava: A foundational language and vision alignment model

A Singh, R Hu, V Goswami… - Proceedings of the …, 2022 - openaccess.thecvf.com
State-of-the-art vision and vision-and-language models rely on large-scale visio-linguistic
pretraining for obtaining good performance on a variety of downstream tasks. Generally …

Bartscore: Evaluating generated text as text generation

W Yuan, G Neubig, P Liu - Advances in Neural Information …, 2021 - proceedings.neurips.cc
A wide variety of NLP applications, such as machine translation, summarization, and dialog,
involve text generation. One major challenge for these applications is how to evaluate …

Swin transformer: Hierarchical vision transformer using shifted windows

Z Liu, Y Lin, Y Cao, H Hu, Y Wei… - Proceedings of the …, 2021 - openaccess.thecvf.com
This paper presents a new vision Transformer, called Swin Transformer, that capably serves
as a general-purpose backbone for computer vision. Challenges in adapting Transformer …

Vlmo: Unified vision-language pre-training with mixture-of-modality-experts

H Bao, W Wang, L Dong, Q Liu… - Advances in …, 2022 - proceedings.neurips.cc
We present a unified Vision-Language pretrained Model (VLMo) that jointly learns a dual
encoder and a fusion encoder with a modular Transformer network. Specifically, we …

Multi-scale vision longformer: A new vision transformer for high-resolution image encoding

P Zhang, X Dai, J Yang, B Xiao… - Proceedings of the …, 2021 - openaccess.thecvf.com
This paper presents a new Vision Transformer (ViT) architecture Multi-Scale Vision
Longformer, which significantly enhances the ViT of [??] for encoding high-resolution …