XGPT: cross-modal generative pre-training for image captioning. 2020

文章

学术资源搜索

获得 5 条结果（用时0.03秒）

我的图书馆

XGPT: cross-modal generative pre-training for image captioning. 2020

在引用文章中搜索

[PDF] mlr.press

Unifying vision-and-language tasks via text generation

J Cho, J Lei, H Tan, M Bansal - International Conference on …, 2021 - proceedings.mlr.press

Existing methods for vision-and-language learning typically require designing task-specific
architectures and objectives for each task. For example, a multi-label answer classifier for …

被引用次数：570 相关文章所有 6 个版本

[PDF] arxiv.org

The contribution of knowledge in visiolinguistic learning: A survey on tasks and challenges

M Lymperaiou, G Stamou - arXiv preprint arXiv:2303.02411, 2023 - arxiv.org

Recent advancements in visiolinguistic (VL) learning have allowed the development of
multiple models and techniques that offer several impressive implementations, able to …

被引用次数：6 相关文章所有 3 个版本

Prevention of Global Mental Health Crisis with Transformer Neural Networks

A Rajagopal, V Nirmala, J Andrew, MV Arun… - Artificial Intelligence for …, 2023 - Springer

The COVID-19 pandemic is causing monumental effects on mental wellbeing worldwide.
Literature calls for action to avert an impending global mental health crisis. This chapter …

Visual‐Text Reference Pretraining Model for Image Captioning

P Li, M Zhang, P Lin, J Wan… - Computational …, 2022 - Wiley Online Library

People can accurately describe an image by constantly referring to the visual information
and key text information of the image. Inspired by this idea, we propose the VTR‐PTM …

被引用次数：2 相关文章所有 11 个版本

Vision-and-Language Pretraining for VQA

Q Wu, P Wang, X Wang, X He, W Zhu - Visual Question Answering: From …, 2022 - Springer

Multimodal (eg, vision and language) pretraining has emerged as a popular topic, and many
representation learning models have been proposed in recent years. In this chapter, we …

高级搜索

QQ 群

XGPT: cross-modal generative pre-training for image captioning. 2020

Unifying vision-and-language tasks via text generation

The contribution of knowledge in visiolinguistic learning: A survey on tasks and challenges

Prevention of Global Mental Health Crisis with Transformer Neural Networks

Visual‐Text Reference Pretraining Model for Image Captioning

Vision-and-Language Pretraining for VQA

引用