Verbs in action: Improving verb understanding in video-language models

L Momeni, M Caron, A Nagrani… - Proceedings of the …, 2023 - openaccess.thecvf.com
Understanding verbs is crucial to modelling how people and objects interact with each other
and the environment through space and time. Recently, state-of-the-art video-language …

Few-shot class-incremental learning via entropy-regularized data-free replay

H Liu, L Gu, Z Chi, Y Wang, Y Yu, J Chen… - European Conference on …, 2022 - Springer
Few-shot class-incremental learning (FSCIL) has been proposed aiming to enable a deep
learning system to incrementally learn new classes with limited data. Recently, a pioneer …

Metagcd: Learning to continually learn in generalized category discovery

Y Wu, Z Chi, Y Wang, S Feng - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
In this paper, we consider a real-world scenario where a model that is trained on pre-defined
classes continually encounters unlabeled data that contains both known and novel classes …

Test of time: Instilling video-language models with a sense of time

P Bagad, M Tapaswi… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
Modelling and understanding time remains a challenge in contemporary video
understanding models. With language emerging as a key driver towards powerful …

Stepformer: Self-supervised step discovery and localization in instructional videos

N Dvornik, I Hadji, R Zhang… - Proceedings of the …, 2023 - openaccess.thecvf.com
Instructional videos are an important resource to learn procedural tasks from human
demonstrations. However, the instruction steps in such videos are typically short and sparse …

Meta-dmoe: Adapting to domain shift by meta-distillation from mixture-of-experts

T Zhong, Z Chi, L Gu, Y Wang… - Advances in Neural …, 2022 - proceedings.neurips.cc
In this paper, we tackle the problem of domain shift. Most existing methods perform training
on multiple source domains using a single model, and the same trained model is used on all …

Rdt-1b: a diffusion foundation model for bimanual manipulation

S Liu, L Wu, B Li, H Tan, H Chen, Z Wang, K Xu… - arXiv preprint arXiv …, 2024 - arxiv.org
Bimanual manipulation is essential in robotics, yet developing foundation models is
extremely challenging due to the inherent complexity of coordinating two robot arms …

Pointcmp: Contrastive mask prediction for self-supervised learning on point cloud videos

Z Shen, X Sheng, L Wang, Y Guo… - Proceedings of the …, 2023 - openaccess.thecvf.com
Self-supervised learning can extract representations of good quality from solely unlabeled
data, which is appealing for point cloud videos due to their high labelling cost. In this paper …

Test-time domain adaptation by learning domain-aware batch normalization

Y Wu, Z Chi, Y Wang, KN Plataniotis… - Proceedings of the AAAI …, 2024 - ojs.aaai.org
Test-time domain adaptation aims to adapt the model trained on source domains to unseen
target domains using a few unlabeled images. Emerging research has shown that the label …

Universal time-series representation learning: A survey

P Trirat, Y Shin, J Kang, Y Nam, J Na, M Bae… - arXiv preprint arXiv …, 2024 - arxiv.org
Time-series data exists in every corner of real-world systems and services, ranging from
satellites in the sky to wearable devices on human bodies. Learning representations by …