Unifying vision-and-language tasks via text generation

J Cho, J Lei, H Tan, M Bansal - International Conference on …, 2021 - proceedings.mlr.press
Existing methods for vision-and-language learning typically require designing task-specific
architectures and objectives for each task. For example, a multi-label answer classifier for …

The contribution of knowledge in visiolinguistic learning: A survey on tasks and challenges

M Lymperaiou, G Stamou - arXiv preprint arXiv:2303.02411, 2023 - arxiv.org
Recent advancements in visiolinguistic (VL) learning have allowed the development of
multiple models and techniques that offer several impressive implementations, able to …

Prevention of Global Mental Health Crisis with Transformer Neural Networks

A Rajagopal, V Nirmala, J Andrew, MV Arun… - Artificial Intelligence for …, 2023 - Springer
The COVID-19 pandemic is causing monumental effects on mental wellbeing worldwide.
Literature calls for action to avert an impending global mental health crisis. This chapter …

Visual‐Text Reference Pretraining Model for Image Captioning

P Li, M Zhang, P Lin, J Wan… - Computational …, 2022 - Wiley Online Library
People can accurately describe an image by constantly referring to the visual information
and key text information of the image. Inspired by this idea, we propose the VTR‐PTM …

Vision-and-Language Pretraining for VQA

Q Wu, P Wang, X Wang, X He, W Zhu - Visual Question Answering: From …, 2022 - Springer
Multimodal (eg, vision and language) pretraining has emerged as a popular topic, and many
representation learning models have been proposed in recent years. In this chapter, we …