Audio-visual class-incremental learning

DW Zhou, HL Sun, HJ Ye… - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com

Abstract Class-Incremental Learning (CIL) requires a learning system to continually learn
new classes without forgetting. Despite the strong performance of Pre-Trained Models …

被引用次数：10 相关文章所有 3 个版本

[PDF] thecvf.com

Class-incremental grouping network for continual audio-visual learning

S Mo, W Pian, Y Tian - Proceedings of the IEEE/CVF …, 2023 - openaccess.thecvf.com

Continual learning is a challenging problem in which models need to be trained on non-
stationary data across sequential tasks for class-incremental learning. While previous …

被引用次数：12 相关文章所有 5 个版本

[PDF] neurips.cc

Weakly-supervised audio-visual segmentation

S Mo, B Raj - Advances in Neural Information Processing …, 2024 - proceedings.neurips.cc

Audio-visual segmentation is a challenging task that aims to predict pixel-level masks for
sound sources in a video. Previous work applied a comprehensive manually designed …

被引用次数：6 相关文章所有 5 个版本

[PDF] thecvf.com

Unveiling the Power of Audio-Visual Early Fusion Transformers with Dense Interactions through Masked Modeling

S Mo, P Morgado - … of the IEEE/CVF Conference on …, 2024 - openaccess.thecvf.com

Humans possess a remarkable ability to integrate auditory and visual information enabling a
deeper understanding of the surrounding environment. This early fusion of audio and visual …

被引用次数：3 相关文章所有 3 个版本

[PDF] ieee.org

Increasing Importance of Joint Analysis of Audio and Video in Computer Vision: A Survey

A Shahabaz, S Sarkar - IEEE Access, 2024 - ieeexplore.ieee.org

The joint analysis of audio and video is a powerful tool that can be applied to various
contexts, including action, speech, and sound recognition, audio-visual video parsing …

相关文章所有 2 个版本

[PDF] arxiv.org

UCIL: An Unsupervised Class Incremental Learning Approach for Sound Event Detection

Y Xiao, RK Das - arXiv preprint arXiv:2407.03657, 2024 - arxiv.org

This work explores class-incremental learning (CIL) for sound event detection (SED),
advancing adaptability towards real-world scenarios. CIL's success in domains like …

被引用次数：1 相关文章所有 2 个版本

[PDF] arxiv.org

Unified Video-Language Pre-training with Synchronized Audio

S Mo, H Wang, H Li, X Tang - arXiv preprint arXiv:2405.07202, 2024 - arxiv.org

Video-language pre-training is a typical and challenging problem that aims at learning
visual and textual representations from large-scale data in a self-supervised way. Existing …

相关文章所有 2 个版本

[PDF] arxiv.org

Answering Diverse Questions via Text Attached with Key Audio-Visual Clues

Q Ye, Z Yu, X Liu - arXiv preprint arXiv:2403.06679, 2024 - arxiv.org

Audio-visual question answering (AVQA) requires reference to video content and auditory
information, followed by correlating the question to predict the most precise answer …

相关文章所有 2 个版本

[PDF] arxiv.org

Continual Contrastive Spoken Language Understanding

U Cappellazzo, E Fini, M Yang, D Falavigna… - arXiv preprint arXiv …, 2023 - arxiv.org

Recently, neural networks have shown impressive progress across diverse fields, with
speech processing being no exception. However, recent breakthroughs in this area require …

Audio-visual Generalized Zero-shot Learning the Easy Way

S Mo, P Morgado - arXiv preprint arXiv:2407.13095, 2024 - arxiv.org

Audio-visual generalized zero-shot learning is a rapidly advancing domain that seeks to
understand the intricate relations between audio and visual cues within videos. The …

相关文章所有 2 个版本

高级搜索

QQ 群