Scene graph refinement network for visual question answering

T Qian, J Chen, S Chen, B Wu… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org
Visual Question Answering aims to answer the free-form natural language question based
on the visual clues in a given image. It is a difficult problem as it requires understanding the …

Show me a video: A large-scale narrated video dataset for coherent story illustration

Y Lu, F Ni, H Wang, X Guo, L Zhu… - IEEE Transactions …, 2023 - ieeexplore.ieee.org
Illustrating a multi-sentence story with visual content is a significant challenge in multimedia
research. While previous works have focused on sequential story-to-visual representations …

Multimodal learning for temporally coherent talking face generation with articulator synergy

L Yu, H Xie, Y Zhang - IEEE Transactions on Multimedia, 2021 - ieeexplore.ieee.org
Talking face generation is a demanding task to synthesize a high quality video with accurate
lip synchronization and rhythmic head motion. However, existing methods always suffer from …

Dual cross-attention for video object segmentation via uncertainty refinement

J Hong, W Zhang, Z Feng… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org
In this paper, we propose a novel approach to video object segmentation where dual
streams consisting of a shared network and a special network are designed to constitute the …

Relation-Aware Distribution Representation Network for Person Clustering with Multiple Modalities

K Liu, S Tang, Z Li, Z Li, L Bai, F Zhu… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
Person clustering with multi-modal clues, including faces, bodies, and voices, is critical for
various tasks, such as movie parsing and identity-based movie editing. Related methods …

Generative Timelines for Instructed Visual Assembly

A Pardo, JH Wang, B Ghanem, J Sivic… - arXiv preprint arXiv …, 2024 - arxiv.org
The objective of this work is to manipulate visual timelines (eg a video) through natural
language instructions, making complex timeline editing tasks accessible to non-expert or …

Video Annotation & Descriptions using Machine Learning & Deep learning: Critical Survey of methods

P Kaushik, V Saxena - Proceedings of the 2023 Fifteenth International …, 2023 - dl.acm.org
Video description methods aim to produce the most relevant description of a video. This
could be description based on full video, frame based or based on important events of the …

AI video editing: A survey

X Zhang, Y Li, Y Han, J Wen - 2022 - preprints.org
Video editing is a high-required job, for it requires skilled artists or workers equipped with
plentiful physical strength and multidisciplinary knowledge, such as cinematography …

[PDF][PDF] Representation Learning of Next Shot Selection for Vlog Editing

YZBGN Li, YZQWZ Yu - cveu.github.io
As vlog has become increasingly popular on videosharing platforms, more amateurs have
participated in vlog creation. One critical step of attractive vlog creation is multi-shot …

[PDF][PDF] AI Video Editing: a Survey. Preprints 2021, 1, 0

Z Xinrong, L Yanghao, H Yuxing… - IEEE Transactions on …, 2017 - scholar.archive.org
Video editing is a high-required job, for it requires skilled artists or workers equipped with
plentiful physical strength and multidisciplinary knowledge, such as cinematography …