查看文章

Ofa: Unifying architectures, tasks, and modalities through a simple sequence-to-sequence learning framework

作者

Peng Wang, An Yang, Rui Men, Junyang Lin, Shuai Bai, Zhikang Li, Jianxin Ma, Chang Zhou, Jingren Zhou, Hongxia Yang

发表日期

2022/6/28

研讨会论文

International conference on machine learning

页码范围

23318-23340

出版商

PMLR

简介

In this work, we pursue a unified paradigm for multimodal pretraining to break the shackles of complex task/modality-specific customization. We propose OFA, a Task-Agnostic and Modality-Agnostic framework that supports Task Comprehensiveness. OFA unifies a diverse set of cross-modal and unimodal tasks, including image generation, visual grounding, image captioning, image classification, language modeling, etc., in a simple sequence-to-sequence learning framework. OFA follows the instruction-based learning in both pretraining and finetuning stages, requiring no extra task-specific layers for downstream tasks. In comparison with the recent state-of-the-art vision & language models that rely on extremely large cross-modal datasets, OFA is pretrained on only 20M publicly available image-text pairs. Despite its simplicity and relatively small-scale training data, OFA achieves new SOTAs in a series of cross …

引用总数

被引用次数：899

202220232024118 477 304

学术搜索中的文章

Ofa: Unifying architectures, tasks, and modalities through a simple sequence-to-sequence learning framework

P Wang, A Yang, R Men, J Lin, S Bai, Z Li, J Ma… - International conference on machine learning, 2022

被引用次数：899 相关文章所有 4 个版本