Hero: Hierarchical spatio-temporal reasoning with contrastive action correspondence for end-to-en...

J Miao, Z Yang, L Fan, Y Yang - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com

Federated Learning (FL) is a distributed learning paradigm that collaboratively learns a
global model across multiple clients with data privacy-preserving. Although many FL …

被引用次数：21 相关文章所有 4 个版本

[PDF] thecvf.com

Winner: Weakly-supervised hierarchical decomposition and alignment for spatio-temporal video grounding

M Li, H Wang, W Zhang, J Miao… - Proceedings of the …, 2023 - openaccess.thecvf.com

Spatio-temporal video grounding aims to localize the aligned visual tube corresponding to a
language query. Existing techniques achieve such alignment by exploiting dense boundary …

被引用次数：22 相关文章所有 3 个版本

[PDF] thecvf.com

Revisiting the domain shift and sample uncertainty in multi-source active domain transfer

W Zhang, Z Lv, H Zhou, JW Liu, J Li… - Proceedings of the …, 2024 - openaccess.thecvf.com

Abstract Active Domain Adaptation (ADA) aims to maximally boost model adaptation in a
new target domain by actively selecting a limited number of target data to annotate. This …

被引用次数：7 相关文章所有 3 个版本

[PDF] thecvf.com

Are binary annotations sufficient? video moment retrieval via hierarchical uncertainty-based active learning

W Ji, R Liang, Z Zheng, W Zhang… - Proceedings of the …, 2023 - openaccess.thecvf.com

Recent research on video moment retrieval has mostly focused on enhancing the
performance of accuracy, efficiency, and robustness, all of which largely rely on the …

被引用次数：20 相关文章所有 7 个版本

[PDF] aaai.org

Panoptic scene graph generation with semantics-prototype learning

L Li, W Ji, Y Wu, M Li, Y Qin, L Wei… - Proceedings of the AAAI …, 2024 - ojs.aaai.org

Panoptic Scene Graph Generation (PSG) parses objects and predicts their relationships
(predicate) to connect human language and visual scenes. However, different language …

被引用次数：13 相关文章所有 3 个版本

[PDF] thecvf.com

Gradient-regulated meta-prompt learning for generalizable vision-language models

J Li, M Gao, L Wei, S Tang, W Zhang… - Proceedings of the …, 2023 - openaccess.thecvf.com

Prompt tuning, a recently emerging paradigm, enables the powerful vision-language pre-
training models to adapt to downstream tasks in a parameter-and data-efficient way, by …

被引用次数：14 相关文章所有 5 个版本

[PDF] aaai.org

Video-audio domain generalization via confounder disentanglement

S Zhang, X Feng, W Fan, W Fang, F Feng… - Proceedings of the …, 2023 - ojs.aaai.org

Existing video-audio understanding models are trained and evaluated in an intra-domain
setting, facing performance degeneration in real-world applications where multiple domains …

被引用次数：7 相关文章所有 2 个版本

[PDF] aclanthology.org

Multi-modal action chain abductive reasoning

M Li, T Wang, J Xu, K Han, S Zhang… - Proceedings of the …, 2023 - aclanthology.org

Abductive Reasoning, has long been considered to be at the core ability of humans, which
enables us to infer the most plausible explanation of incomplete known phenomena in daily …

被引用次数：7 相关文章所有 2 个版本

[PDF] thecvf.com

Learning in imperfect environment: Multi-label classification with long-tailed distribution and partial labels

W Zhang, C Liu, L Zeng, B Ooi… - Proceedings of the …, 2023 - openaccess.thecvf.com

Conventional multi-label classification (MLC) methods assume that all samples are fully
labeled and identically distributed. Unfortunately, this assumption is unrealistic in large …

被引用次数：8 相关文章所有 6 个版本

Unsupervised domain adaptation for video object grounding with cascaded debiasing learning

M Li, H Zhang, J Li, Z Zhao, W Zhang, S Zhang… - Proceedings of the 31st …, 2023 - dl.acm.org

This paper addresses the Unsupervised Domain Adaptation (UDA) for the dense frame
prediction task-Video Object Grounding (VOG). This investigation springs from the …

被引用次数：4 相关文章

高级搜索

QQ 群