H Wang, B Guo, Y Zeng, Y Ding, C Qiu, Y Zhang… - arXiv preprint arXiv …, 2022 - arxiv.org
The intelligent dialogue system, aiming at communicating with humans harmoniously with natural language, is brilliant for promoting the advancement of human-machine interaction …
S Yoon, D Kim, E Yoon, HS Yoon, J Kim… - arXiv preprint arXiv …, 2023 - arxiv.org
Video-grounded Dialogue (VGD) aims to answer questions regarding a given multi-modal input comprising video, audio, and dialogue history. Although there have been numerous …
H Le, NF Chen, SCH Hoi - arXiv preprint arXiv:2206.07898, 2022 - arxiv.org
Designed for tracking user goals in dialogues, a dialogue state tracker is an essential component in a dialogue system. However, the research of dialogue state tracking has …
Abstract We present\(\mathbb {MST} _\mathbb {MIXER}\)–a novel video dialog model operating over a generic multi-modal state tracking scheme. Current models that claim to …
We study video-grounded dialogue generation, where a response is generated based on the dialogue context and the associated video. The primary challenges of this task lie in (1) …
It would be a technological feat to be able to create a system that can hold a meaningful conversation with humans about what they watch. A setup toward that goal is presented as a …
H Zhang, M Liu, Y Wang, D Cao, W Guan… - arXiv preprint arXiv …, 2023 - arxiv.org
In contrast to conventional visual question answering, video-grounded dialog necessitates a profound understanding of both dialog history and video content for accurate response …
Abstract Video-Grounded Dialogue System (VGDS), focusing on generating reasonable responses based on multi-turn dialogue contexts and a given video, has received intensive …
A Abdessaied, M von Hochmeister, A Bulling - arXiv preprint arXiv …, 2024 - arxiv.org
We present the Object Language Video Transformer (OLViT)-a novel model for video dialog operating over a multi-modal attention-based dialog state tracker. Existing video dialog …