As Large Language Models (LLMs) become popular, there emerged an important trend of using multimodality to augment the LLMs' generation ability, which enables LLMs to better …
Task-oriented dialogue (TOD) systems have been widely used by mobile phone intelligent assistants to accomplish tasks such as calendar scheduling or hotel reservation. Current …
H Le, N Chen, S Hoi - Proceedings of the 2022 Conference of the …, 2022 - aclanthology.org
Neural module networks (NMN) have achieved success in image-grounded tasks such as Visual Question Answering (VQA) on synthetic images. However, very limited work on NMN …
H Wang, B Guo, Y Zeng, Y Ding, C Qiu, Y Zhang… - arXiv preprint arXiv …, 2022 - arxiv.org
The intelligent dialogue system, aiming at communicating with humans harmoniously with natural language, is brilliant for promoting the advancement of human-machine interaction …
The majority of traditional text-to-video retrieval systems operate in static environments, ie, there is no interaction between the user and the agent beyond the initial textual query …
T Qiao, Q Men, FWB Li, Y Kubotani… - … on Computer Vision, 2022 - Springer
Abstract Human-Object Interaction (HOI) recognition in videos is important for analyzing human activity. Most existing work focusing on visual features usually suffer from occlusion …
A video-grounded dialogue system is required to understand both dialogue, which contains semantic dependencies from turn to turn, and video, which contains visual cues of spatial …
T Udagawa, A Aizawa - Transactions of the Association for …, 2021 - direct.mit.edu
Common grounding is the process of creating and maintaining mutual understandings, which is a critical aspect of sophisticated human communication. While various task settings …
H Le, NF Chen, SCH Hoi - arXiv preprint arXiv:2206.07898, 2022 - arxiv.org
Designed for tracking user goals in dialogues, a dialogue state tracker is an essential component in a dialogue system. However, the research of dialogue state tracking has …