相关文章- 学术资源搜索

Pali-x: On scaling up a multilingual vision and language model

X Chen, J Djolonga, P Padlewski, B Mustafa… - arXiv preprint arXiv …, 2023 - arxiv.org

We present the training recipe and results of scaling up PaLI-X, a multilingual vision and
language model, both in terms of size of the components and the breadth of its training task …

被引用次数：106 相关文章所有 4 个版本

[PDF] thecvf.com

On Scaling Up a Multilingual Vision and Language Model

X Chen, J Djolonga, P Padlewski… - Proceedings of the …, 2024 - openaccess.thecvf.com

We explore the boundaries of scaling up a multilingual vision and language model both in
terms of size of the components and the breadth of its training task mixture. Our model …

被引用次数：1 相关文章

[PDF] arxiv.org

Pali: A jointly-scaled multilingual language-image model

X Chen, X Wang, S Changpinyo… - arXiv preprint arXiv …, 2022 - arxiv.org

Effective scaling and a flexible task interface enable large language models to excel at many
tasks. We present PaLI (Pathways Language and Image model), a model that extends this …

被引用次数：493 相关文章所有 6 个版本

[PDF] arxiv.org

Qwen-vl: A frontier large vision-language model with versatile abilities

J Bai, S Bai, S Yang, S Wang, S Tan, P Wang… - arXiv preprint arXiv …, 2023 - arxiv.org

In this work, we introduce the Qwen-VL series, a set of large-scale vision-language models
(LVLMs) designed to perceive and understand both texts and images. Starting from the …

被引用次数：472 相关文章所有 2 个版本

[PDF] thecvf.com

12-in-1: Multi-task vision and language representation learning

J Lu, V Goswami, M Rohrbach… - Proceedings of the …, 2020 - openaccess.thecvf.com

Much of vision-and-language research focuses on a small but diverse set of independent
tasks and supporting datasets often studied in isolation; however, the visually-grounded …

被引用次数：520 相关文章所有 7 个版本

[PDF] openreview.net

Qwen-vl: A versatile vision-language model for understanding, localization, text reading, and beyond

J Bai, S Bai, S Yang, S Wang, S Tan, P Wang, J Lin… - 2023 - openreview.net

In this work, we introduce the Qwen-VL series, a set of large-scale vision-language models
(LVLMs) designed to perceive and understand both texts and images. Starting from the …

被引用次数：196 相关文章

[PDF] nowpublishers.com

Vision-language pre-training: Basics, recent advances, and future trends

Z Gan, L Li, C Li, L Wang, Z Liu… - Foundations and Trends …, 2022 - nowpublishers.com

This monograph surveys vision-language pre-training (VLP) methods for multimodal
intelligence that have been developed in the last few years. We group these approaches …

被引用次数：151 相关文章所有 7 个版本

[PDF] neurips.cc

Language is not all you need: Aligning perception with language models

S Huang, L Dong, W Wang, Y Hao… - Advances in …, 2024 - proceedings.neurips.cc

A big convergence of language, multimodal perception, action, and world modeling is a key
step toward artificial general intelligence. In this work, we introduce KOSMOS-1, a …

被引用次数：331 相关文章所有 5 个版本

[PDF] arxiv.org

Image as a foreign language: Beit pretraining for all vision and vision-language tasks

W Wang, H Bao, L Dong, J Bjorck, Z Peng, Q Liu… - arXiv preprint arXiv …, 2022 - arxiv.org

A big convergence of language, vision, and multimodal pretraining is emerging. In this work,
we introduce a general-purpose multimodal foundation model BEiT-3, which achieves state …

被引用次数：293 相关文章所有 3 个版本

[PDF] arxiv.org

Lavis: A library for language-vision intelligence

D Li, J Li, H Le, G Wang, S Savarese… - arXiv preprint arXiv …, 2022 - arxiv.org

We introduce LAVIS, an open-source deep learning library for LAnguage-VISion research
and applications. LAVIS aims to serve as a one-stop comprehensive library that brings …

被引用次数：83 相关文章所有 4 个版本

高级搜索

QQ 群

Pali-x: On scaling up a multilingual vision and language model

On Scaling Up a Multilingual Vision and Language Model

Pali: A jointly-scaled multilingual language-image model

Qwen-vl: A frontier large vision-language model with versatile abilities

12-in-1: Multi-task vision and language representation learning

Qwen-vl: A versatile vision-language model for understanding, localization, text reading, and beyond

Vision-language pre-training: Basics, recent advances, and future trends

Language is not all you need: Aligning perception with language models

Image as a foreign language: Beit pretraining for all vision and vision-language tasks

Lavis: A library for language-vision intelligence

相关搜索

引用