ViOCRVQA: Novel Benchmark Dataset and Vision Reader for Visual Question Answering by Understanding Vietnamese Text in Images

HQ Pham, TKB Nguyen, Q Van Nguyen… - arXiv preprint arXiv …, 2024 - arxiv.org
Optical Character Recognition-Visual Question Answering (OCR-VQA) is the task of
answering text information contained in images that have just been significantly developed …

New benchmark dataset and fine-grained cross-modal fusion framework for vietnamese multimodal aspect-category sentiment analysis

QH Nguyen, MVT Nguyen, K Van Nguyen - Multimedia Systems, 2025 - Springer
The emergence of multimodal data on social media platforms presents new opportunities to
better understand user sentiments toward a given aspect. However, existing multimodal …

Answering, Fast and Slow: Strategy enhancement of visual understanding guided by causality

C Wang, Z Wang, Y Zhou - Neurocomputing, 2025 - Elsevier
In his classic book Thinking, Fast and Slow (Daniel, 2017), Kahneman points out that human
thinking can be categorized into two main modes of thinking: a system that displays intuition …

ViTextVQA: A Large-Scale Visual Question Answering Dataset for Evaluating Vietnamese Text Comprehension in Images

Q Van Nguyen, DQ Tran, HQ Pham… - arXiv preprint arXiv …, 2024 - arxiv.org
Visual Question Answering (VQA) is a complicated task that requires the capability of
simultaneously processing natural language and images. Initially, this task was researched …

ViConsFormer: Constituting Meaningful Phrases of Scene Texts using Transformer-based Method in Vietnamese Text-based Visual Question Answering

NH Nguyen, TT Quan, NLT Nguyen - arXiv preprint arXiv:2410.14132, 2024 - arxiv.org
Text-based VQA is a challenging task that requires machines to use scene texts in given
images to yield the most appropriate answer for the given question. The main challenge of …

Generative Pre‐Trained Transformer for Vietnamese Community‐Based COVID‐19 Question Answering

TM Vo, KV Tran - … and Multidisciplinary IT Solutions for Society, 2024 - Wiley Online Library
Recent studies have provided empirical evidence of the wide‐ranging potential of
Generative Pre‐trained Transformer (GPT), a pre‐trained language model, in the field of …