Expandable subspace ensemble for pre-trained model-based class-incremental learning

DW Zhou, HL Sun, HJ Ye… - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com
Abstract Class-Incremental Learning (CIL) requires a learning system to continually learn
new classes without forgetting. Despite the strong performance of Pre-Trained Models …

Class-incremental grouping network for continual audio-visual learning

S Mo, W Pian, Y Tian - Proceedings of the IEEE/CVF …, 2023 - openaccess.thecvf.com
Continual learning is a challenging problem in which models need to be trained on non-
stationary data across sequential tasks for class-incremental learning. While previous …

Weakly-supervised audio-visual segmentation

S Mo, B Raj - Advances in Neural Information Processing …, 2024 - proceedings.neurips.cc
Audio-visual segmentation is a challenging task that aims to predict pixel-level masks for
sound sources in a video. Previous work applied a comprehensive manually designed …

Unveiling the Power of Audio-Visual Early Fusion Transformers with Dense Interactions through Masked Modeling

S Mo, P Morgado - … of the IEEE/CVF Conference on …, 2024 - openaccess.thecvf.com
Humans possess a remarkable ability to integrate auditory and visual information enabling a
deeper understanding of the surrounding environment. This early fusion of audio and visual …

Increasing Importance of Joint Analysis of Audio and Video in Computer Vision: A Survey

A Shahabaz, S Sarkar - IEEE Access, 2024 - ieeexplore.ieee.org
The joint analysis of audio and video is a powerful tool that can be applied to various
contexts, including action, speech, and sound recognition, audio-visual video parsing …

UCIL: An Unsupervised Class Incremental Learning Approach for Sound Event Detection

Y Xiao, RK Das - arXiv preprint arXiv:2407.03657, 2024 - arxiv.org
This work explores class-incremental learning (CIL) for sound event detection (SED),
advancing adaptability towards real-world scenarios. CIL's success in domains like …

Unified Video-Language Pre-training with Synchronized Audio

S Mo, H Wang, H Li, X Tang - arXiv preprint arXiv:2405.07202, 2024 - arxiv.org
Video-language pre-training is a typical and challenging problem that aims at learning
visual and textual representations from large-scale data in a self-supervised way. Existing …

Answering Diverse Questions via Text Attached with Key Audio-Visual Clues

Q Ye, Z Yu, X Liu - arXiv preprint arXiv:2403.06679, 2024 - arxiv.org
Audio-visual question answering (AVQA) requires reference to video content and auditory
information, followed by correlating the question to predict the most precise answer …

Continual Contrastive Spoken Language Understanding

U Cappellazzo, E Fini, M Yang, D Falavigna… - arXiv preprint arXiv …, 2023 - arxiv.org
Recently, neural networks have shown impressive progress across diverse fields, with
speech processing being no exception. However, recent breakthroughs in this area require …

Audio-visual Generalized Zero-shot Learning the Easy Way

S Mo, P Morgado - arXiv preprint arXiv:2407.13095, 2024 - arxiv.org
Audio-visual generalized zero-shot learning is a rapidly advancing domain that seeks to
understand the intricate relations between audio and visual cues within videos. The …