Fedseg: Class-heterogeneous federated learning for semantic segmentation

J Miao, Z Yang, L Fan, Y Yang - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
Federated Learning (FL) is a distributed learning paradigm that collaboratively learns a
global model across multiple clients with data privacy-preserving. Although many FL …

Winner: Weakly-supervised hierarchical decomposition and alignment for spatio-temporal video grounding

M Li, H Wang, W Zhang, J Miao… - Proceedings of the …, 2023 - openaccess.thecvf.com
Spatio-temporal video grounding aims to localize the aligned visual tube corresponding to a
language query. Existing techniques achieve such alignment by exploiting dense boundary …

Revisiting the domain shift and sample uncertainty in multi-source active domain transfer

W Zhang, Z Lv, H Zhou, JW Liu, J Li… - Proceedings of the …, 2024 - openaccess.thecvf.com
Abstract Active Domain Adaptation (ADA) aims to maximally boost model adaptation in a
new target domain by actively selecting a limited number of target data to annotate. This …

Are binary annotations sufficient? video moment retrieval via hierarchical uncertainty-based active learning

W Ji, R Liang, Z Zheng, W Zhang… - Proceedings of the …, 2023 - openaccess.thecvf.com
Recent research on video moment retrieval has mostly focused on enhancing the
performance of accuracy, efficiency, and robustness, all of which largely rely on the …

Panoptic scene graph generation with semantics-prototype learning

L Li, W Ji, Y Wu, M Li, Y Qin, L Wei… - Proceedings of the AAAI …, 2024 - ojs.aaai.org
Panoptic Scene Graph Generation (PSG) parses objects and predicts their relationships
(predicate) to connect human language and visual scenes. However, different language …

Gradient-regulated meta-prompt learning for generalizable vision-language models

J Li, M Gao, L Wei, S Tang, W Zhang… - Proceedings of the …, 2023 - openaccess.thecvf.com
Prompt tuning, a recently emerging paradigm, enables the powerful vision-language pre-
training models to adapt to downstream tasks in a parameter-and data-efficient way, by …

Video-audio domain generalization via confounder disentanglement

S Zhang, X Feng, W Fan, W Fang, F Feng… - Proceedings of the …, 2023 - ojs.aaai.org
Existing video-audio understanding models are trained and evaluated in an intra-domain
setting, facing performance degeneration in real-world applications where multiple domains …

Multi-modal action chain abductive reasoning

M Li, T Wang, J Xu, K Han, S Zhang… - Proceedings of the …, 2023 - aclanthology.org
Abductive Reasoning, has long been considered to be at the core ability of humans, which
enables us to infer the most plausible explanation of incomplete known phenomena in daily …

Learning in imperfect environment: Multi-label classification with long-tailed distribution and partial labels

W Zhang, C Liu, L Zeng, B Ooi… - Proceedings of the …, 2023 - openaccess.thecvf.com
Conventional multi-label classification (MLC) methods assume that all samples are fully
labeled and identically distributed. Unfortunately, this assumption is unrealistic in large …

Unsupervised domain adaptation for video object grounding with cascaded debiasing learning

M Li, H Zhang, J Li, Z Zhao, W Zhang, S Zhang… - Proceedings of the 31st …, 2023 - dl.acm.org
This paper addresses the Unsupervised Domain Adaptation (UDA) for the dense frame
prediction task-Video Object Grounding (VOG). This investigation springs from the …