Temporal action localization (TAL) requires long-form reasoning to predict actions of various durations and complex content. Given limited GPU memory, training TAL end to end (ie, from …
G Chen, Y Wu, S Liu, T Liu, X Du, F Wei - arXiv preprint arXiv:2308.12770, 2023 - arxiv.org
Recent breakthroughs in zero-shot voice synthesis have enabled imitating a speaker's voice using just a few seconds of recording while maintaining a high level of realism. Alongside its …
Large pretrained models are increasingly crucial in modern computer vision tasks. These models are typically used in downstream tasks by end-to-end finetuning which is highly …
AI-generated video has revolutionized short video production, filmmaking, and personalized media, making video local editing an essential tool. However, this progress also blurs the …
L Zhao, H Li, X Ning, X Jiang - Proceedings of the IEEE/CVF …, 2024 - openaccess.thecvf.com
Cross-modal Steganography is the practice of concealing secret signals in publicly available cover signals (distinct from the modality of the secret signals) unobtrusively. While previous …
M Biswal, T Shao, K Rose, P Yin… - Proceedings of the …, 2024 - openaccess.thecvf.com
Numerous studies have recently advanced the state-of-the art for representing videos through an implicit neural network (INR). As these models become increasingly ubiquitous …
J Wu, Z Wu, Y Xue, J Wen, W Peng - arXiv preprint arXiv:2404.10229, 2024 - arxiv.org
Recent advances in large language models (LLMs) have blurred the boundary of high- quality text generation between humans and machines, which is favorable for generative …
QW Gan, WC Yau, YS Gan, I Salam, S Guo… - Expert Systems with …, 2024 - Elsevier
Motion capture (mocap) data stores the skeleton movement of recorded objects or humans and is essential for various 3D applications, such as games, animations, virtual reality …
L Zhang, Y Lu, T Li, G Lu - IEEE Transactions on Industrial …, 2024 - ieeexplore.ieee.org
Transmission distortions within steganography systems easily cause dramatic degradations of revealing and invisibility performances. Previous works lacked sufficient adaptation for …