From speaker to dubber: movie dubbing with prosody and duration consistency learning

Z Zhang, L Li, G Cong, H Yin, Y Gao, C Yan… - Proceedings of the …, 2024 - dl.acm.org
Movie Dubbing aims to convert scripts into speeches that align with the given movie clip in
both temporal and emotional aspects while preserving the vocal timbre of one brief …

Training-free video temporal grounding using large-scale pre-trained models

M Zheng, X Cai, Q Chen, Y Peng, Y Liu - European Conference on …, 2024 - Springer
Video temporal grounding aims to identify video segments within untrimmed videos that are
most relevant to a given natural language query. Existing video temporal localization models …

It Takes Two: Accurate Gait Recognition in the Wild via Cross-granularity Alignment

J Zheng, X Liu, B Zhang, C Yan, J Zhang… - Proceedings of the …, 2024 - dl.acm.org
Existing studies for gait recognition primarily utilized sequences of either binary silhouette or
human parsing to encode the shapes and dynamics of persons during walking. Silhouettes …

Stochastic Context Consistency Reasoning for Domain Adaptive Object Detection

Y Cui, L Li, J Zhang, C Yan, H Wang, S Wang… - Proceedings of the …, 2024 - dl.acm.org
Domain Adaptive Object Detection (DAOD) aims to improve the adaptation of the detector for
the unlabeled target domain by the labeled source domain. Recent advances leverage a …

EventHDR: From Event to High-Speed HDR Videos and Beyond

Y Zou, Y Fu, T Takatani, Y Zheng - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
Event cameras are innovative neuromorphic sensors that asynchronously capture the scene
dynamics. Due to the event-triggering mechanism, such cameras record event streams with …

Mitigate Catastrophic Remembering via Continual Knowledge Purification for Noisy Lifelong Person Re-Identification

K Xu, H Zhang, Y Li, Y Peng, J Zhou - Proceedings of the 32nd ACM …, 2024 - dl.acm.org
Current Lifelong Person Re-Identification (LReID) methods focus on tackling a clean data
stream with accurate labels. When noisy data with incorrect labels are given, their …

Progressive Prototype Evolving for Dual-Forgetting Mitigation in Non-Exemplar Online Continual Learning

Q Li, Y Peng, J Zhou - Proceedings of the 32nd ACM International …, 2024 - dl.acm.org
Online Continual Learning (OCL) aims at learning a model through a sequence of single-
pass data, usually encountering the challenges of catastrophic forgetting both between …

InsVP: Efficient Instance Visual Prompting from Image Itself

Z Liu, Y Peng, J Zhou - Proceedings of the 32nd ACM International …, 2024 - dl.acm.org
Visual prompting is an efficient methodology for finetuning pretrained visual models by
introducing a small number of learnable parameters while keeping the backbone frozen …

Object-Aware NIR-to-Visible Translation

Y Gao, L Gu, Q Liu, Y Fu - European Conference on Computer Vision, 2024 - Springer
While near-infrared (NIR) imaging is essential for assisted driving and safety monitoring
systems, its monochromatic nature hinders its broader application, which prompts the …

Privacy-enhanced prototype-based federated cross-modal hashing for cross-modal retrieval

R Zuo, C Zheng, F Li, L Zhu, Z Zhang - ACM Transactions on Multimedia …, 2024 - dl.acm.org
Cross-modal hashing is widely used for efficient similarity searches, improving data
processing efficiency, and reducing storage costs. Existing cross-modal hashing methods …