Audio visual scene-aware dialog system using dynamic memory networks

S Yoon, E Yoon, HS Yoon, J Kim, CD Yoo - arXiv preprint arXiv …, 2022 - arxiv.org

Video-grounded Dialogue (VGD) aims to decode an answer sentence to a question
regarding a given video and dialogue context. Despite the recent success of multi-modal …

被引用次数：16 相关文章所有 6 个版本

[PDF] aaai.org

Structured co-reference graph attention for video-grounded dialogue

J Kim, S Yoon, D Kim, CD Yoo - … of the AAAI Conference on Artificial …, 2021 - ojs.aaai.org

A video-grounded dialogue system referred to as the Structured Co-reference Graph
Attention (SCGA) is presented for decoding the answer sequence to a question regarding a …

被引用次数：26 相关文章所有 8 个版本

[PDF] arxiv.org

HEAR: Hearing enhanced audio response for video-grounded dialogue

S Yoon, D Kim, E Yoon, HS Yoon, J Kim… - arXiv preprint arXiv …, 2023 - arxiv.org

Video-grounded Dialogue (VGD) aims to answer questions regarding a given multi-modal
input comprising video, audio, and dialogue history. Although there have been numerous …

被引用次数：7 相关文章所有 5 个版本

Dialogmcf: Multimodal context flow for audio visual scene-aware dialog

Z Chen, H Liu, Y Wang - IEEE/ACM Transactions on Audio …, 2023 - ieeexplore.ieee.org

In recent years, Audio Visual Scene-Aware Dialog (AVSD) has been an active research task
in the multimodal dialogue community and has also been a core part of the Dialog System …

被引用次数：7 相关文章所有 2 个版本

[PDF] arxiv.org

Video dialog as conversation about objects living in space-time

HA Pham, TM Le, V Le, TM Phuong, T Tran - European Conference on …, 2022 - Springer

It would be a technological feat to be able to create a system that can hold a meaningful
conversation with humans about what they watch. A setup toward that goal is presented as a …

被引用次数：9 相关文章所有 10 个版本

[PDF] arxiv.org

MSG-BART: Multi-Granularity Scene Graph-Enhanced Encoder-Decoder Language Model for Video-Grounded Dialogue Generation

H Liu, Z Chen, H Li, P Wang, Y Wang… - ICASSP 2024-2024 …, 2024 - ieeexplore.ieee.org

Generating dialogue grounded in videos requires a high level of understanding and
reasoning about the visual scenes in the videos. However, existing large visual-language …

被引用次数：1 相关文章所有 3 个版本

[PDF] arxiv.org

Uncovering Hidden Connections: Iterative Tracking and Reasoning for Video-grounded Dialog

H Zhang, M Liu, Y Wang, D Cao, W Guan… - arXiv preprint arXiv …, 2023 - arxiv.org

In contrast to conventional visual question answering, video-grounded dialog necessitates a
profound understanding of both dialog history and video content for accurate response …

被引用次数：2 相关文章所有 2 个版本

Revisiting audio visual scene-aware dialog

A Liu, H Xie, X Liu, Z Yin, S Liu - Neurocomputing, 2022 - Elsevier

Abstract Audio Visual Scene-Aware Dialog (AVSD) has drawn intense interests, in which
models are required to understand dynamic scenes in videos and dialog contexts in order to …

被引用次数：3 相关文章所有 2 个版本

A Coarse and Fine Grained Masking Approach for Video-Grounded Dialogue

F Xu, W Zhou, T Sun, J Lu, Z Yu, G Li - International Conference on …, 2024 - Springer

Abstract The task of Video-Grounded Dialogue involves developing a multimodal chatbot
capable of answering sequential questions from humans regarding video content, audio …

Enhancing Cross-Modal Understanding for Audio Visual Scene-Aware Dialog Through Contrastive Learning

F Xu, W Zhou, G Li, Z Zhong… - 2024 IEEE International …, 2024 - ieeexplore.ieee.org

Audio Visual Scene-Aware Dialog is a task where a robot answers questions based on short
video and audio content as well as dialog history. Although previous studies try to improve …

高级搜索

QQ 群