The rapid advancement of foundation models (FMs) across language, image, audio, and video domains has shown remarkable capabilities in diverse tasks. However, the …
W Zhao, H Wu, W He, H Bi, H Wang… - … on Circuits and …, 2023 - ieeexplore.ieee.org
Due to inherent interactivity, time-sync comment of videos have attracted increasing attention and were widely adopted in online video platforms. In addition to enhancing user …
D Pei, D Huang, L Kong, Y Wang - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
Group Activity Recognition (GAR) is a challenging task, where modeling spatio-temporal relationships among participants plays a fundamental role. To address this issue, we …
M Tan, Z Wen, L Fang, Q Wu - ACM Transactions on Multimedia …, 2023 - dl.acm.org
Visual Relational Reasoning is the basis of many vision-and-language based tasks (eg, visual question answering and referring expression comprehension). In this article, we …
L Li, Y Zhang, L Yuan, X Gao - IEEE Transactions on Circuits …, 2024 - ieeexplore.ieee.org
Recent CNN-driven face super-resolution (FSR) technologies have achieved excellent breakthroughs by incorporating facial prior knowledge. However, most of them suffer from …
P Zhao, Y Chen, Y Zhao, W Jia, Z Zhang… - arXiv preprint arXiv …, 2024 - arxiv.org
Automatic image colorization is inherently an ill-posed problem with uncertainty, which requires an accurate semantic understanding of scenes to estimate reasonable colors for …