The emergence of multimodal data on social media platforms presents new opportunities to better understand user sentiments toward a given aspect. However, existing multimodal …
In his classic book Thinking, Fast and Slow (Daniel, 2017), Kahneman points out that human thinking can be categorized into two main modes of thinking: a system that displays intuition …
Visual Question Answering (VQA) is a complicated task that requires the capability of simultaneously processing natural language and images. Initially, this task was researched …
Text-based VQA is a challenging task that requires machines to use scene texts in given images to yield the most appropriate answer for the given question. The main challenge of …
TM Vo, KV Tran - … and Multidisciplinary IT Solutions for Society, 2024 - Wiley Online Library
Recent studies have provided empirical evidence of the wide‐ranging potential of Generative Pre‐trained Transformer (GPT), a pre‐trained language model, in the field of …