xGQA: Cross-lingual visual question answering

A Rogers, M Gardner, I Augenstein - ACM Computing Surveys, 2023 - dl.acm.org

Alongside huge volumes of research on deep learning models in NLP in the recent years,
there has been much work on benchmark datasets needed to track modeling progress …

被引用次数：191 相关文章所有 6 个版本

[PDF] arxiv.org

Pali: A jointly-scaled multilingual language-image model

X Chen, X Wang, S Changpinyo… - arXiv preprint arXiv …, 2022 - arxiv.org

Effective scaling and a flexible task interface enable large language models to excel at many
tasks. We present PaLI (Pathways Language and Image model), a model that extends this …

被引用次数：535 相关文章所有 6 个版本

[PDF] arxiv.org

Modular deep learning

J Pfeiffer, S Ruder, I Vulić, EM Ponti - arXiv preprint arXiv:2302.11529, 2023 - arxiv.org

Transfer learning has recently become the dominant paradigm of machine learning. Pre-
trained models fine-tuned for downstream tasks achieve better performance with fewer …

被引用次数：84 相关文章所有 5 个版本

[PDF] arxiv.org

From image to language: A critical analysis of visual question answering (vqa) approaches, challenges, and opportunities

MF Ishmam, MSH Shovon, MF Mridha, N Dey - Information Fusion, 2024 - Elsevier

The multimodal task of Visual Question Answering (VQA) encompassing elements of
Computer Vision (CV) and Natural Language Processing (NLP), aims to generate answers …

被引用次数：9 相关文章所有 2 个版本

[PDF] mlr.press

IGLUE: A benchmark for transfer learning across modalities, tasks, and languages

E Bugliarello, F Liu, J Pfeiffer, S Reddy… - International …, 2022 - proceedings.mlr.press

Reliable evaluation benchmarks designed for replicability and comprehensiveness have
driven progress in machine learning. Due to the lack of a multilingual benchmark, however …

被引用次数：54 相关文章所有 5 个版本

[PDF] arxiv.org

PaliGemma: A versatile 3B VLM for transfer

L Beyer, A Steiner, AS Pinto, A Kolesnikov… - arXiv preprint arXiv …, 2024 - arxiv.org

PaliGemma is an open Vision-Language Model (VLM) that is based on the SigLIP-So400m
vision encoder and the Gemma-2B language model. It is trained to be a versatile and …

被引用次数：18 相关文章所有 2 个版本

[PDF] arxiv.org

One country, 700+ languages: NLP challenges for underrepresented languages and dialects in Indonesia

AF Aji, GI Winata, F Koto, S Cahyawijaya… - arXiv preprint arXiv …, 2022 - arxiv.org

NLP research is impeded by a lack of resources and awareness of the challenges presented
by underrepresented languages and dialects. Focusing on the languages spoken in …

被引用次数：64 相关文章所有 10 个版本

[PDF] arxiv.org

Large multilingual models pivot zero-shot multimodal learning across languages

J Hu, Y Yao, C Wang, S Wang, Y Pan, Q Chen… - arXiv preprint arXiv …, 2023 - arxiv.org

Recently there has been a significant surge in multimodal learning in terms of both image-to-
text and text-to-image generation. However, the success is typically limited to English …

被引用次数：25 相关文章所有 3 个版本

[PDF] arxiv.org

Adapters: A unified library for parameter-efficient and modular transfer learning

C Poth, H Sterz, I Paul, S Purkayastha… - arXiv preprint arXiv …, 2023 - arxiv.org

We introduce Adapters, an open-source library that unifies parameter-efficient and modular
transfer learning in large language models. By integrating 10 diverse adapter methods into a …

被引用次数：27 相关文章所有 3 个版本

[PDF] aclanthology.org

Unifying cross-lingual and cross-modal modeling towards weakly supervised multilingual vision-language pre-training

Z Li, Z Fan, J Chen, Q Zhang, XJ Huang… - Proceedings of the 61st …, 2023 - aclanthology.org

Abstract Multilingual Vision-Language Pre-training (VLP) is a promising but challenging
topic due to the lack of large-scale multilingual image-text pairs. Existing works address the …

被引用次数：10 相关文章所有 3 个版本

高级搜索

QQ 群