相关文章- 学术资源搜索

Active speaker detection with audio-visual co-training

P Chakravarty, J Zegers, T Tuytelaars… - Proceedings of the 18th …, 2016 - dl.acm.org

In this work, we show how to co-train a classifier for active speaker detection using audio-
visual data. First, audio Voice Activity Detection (VAD) is used to train a personalized video …

被引用次数：19 相关文章所有 6 个版本

[PDF] arxiv.org

Cross-modal supervision for learning active speaker detection in video

P Chakravarty, T Tuytelaars - … Amsterdam, The Netherlands, October 11-14 …, 2016 - Springer

In this paper, we show how to use audio to supervise the learning of active speaker
detection in video. Voice Activity Detection (VAD) guides the learning of the vision-based …

被引用次数：62 相关文章所有 8 个版本

[PDF] ieee.org

Audio-visual activity guided cross-modal identity association for active speaker detection

R Sharma, S Narayanan - IEEE Open Journal of Signal …, 2023 - ieeexplore.ieee.org

Active speaker detection in videos addresses associating a source face, visible in the video
frames, with the underlying speech in the audio modality. The two primary sources of …

被引用次数：5 相关文章所有 3 个版本

[PDF] kuleuven.be

Who's speaking? Audio-supervised classification of active speakers in video

P Chakravarty, S Mirzaei, T Tuytelaars… - Proceedings of the …, 2015 - dl.acm.org

Active speakers have traditionally been identified in video by detecting their moving lips.
This paper demonstrates the same using spatio-temporal features that aim to capture other …

被引用次数：39 相关文章所有 6 个版本

[PDF] thecvf.com

Maas: Multi-modal assignation for active speaker detection

JL Alcázar, F Caba, AK Thabet… - Proceedings of the …, 2021 - openaccess.thecvf.com

Active speaker detection requires a solid integration of multi-modal cues. While individual
modalities can approximate a solution, accurate predictions can only be achieved by …

被引用次数：52 相关文章所有 8 个版本

[PDF] arxiv.org

Unsupervised active speaker detection in media content using cross-modal information

R Sharma, S Narayanan - arXiv preprint arXiv:2209.11896, 2022 - arxiv.org

We present a cross-modal unsupervised framework for active speaker detection in media
content such as TV shows and movies. Machine learning advances have enabled …

被引用次数：2 相关文章所有 2 个版本

[PDF] arxiv.org

Ava active speaker: An audio-visual dataset for active speaker detection

J Roth, S Chaudhuri, O Klejch, R Marvin… - ICASSP 2020-2020 …, 2020 - ieeexplore.ieee.org

Active speaker detection is an important component in video analysis algorithms for
applications such as speaker diarization, video re-targeting for meetings, speech …

被引用次数：169 相关文章所有 6 个版本

[PDF] thecvf.com

Active speakers in context

JL Alcázar, F Caba, L Mai, F Perazzi… - Proceedings of the …, 2020 - openaccess.thecvf.com

Current methods for active speaker detection focus on modeling audiovisual information
from a single speaker. This strategy can be adequate for addressing single-speaker …

被引用次数：84 相关文章所有 8 个版本

[PDF] arxiv.org

Learning spatial-temporal graphs for active speaker detection

S Roy, K Min, S Tripathi, T Guha… - arXiv preprint arXiv …, 2021 - arxiv.org

We address the problem of active speaker detection through a new framework, called
SPELL, that learns long-range multimodal graphs to encode the inter-modal relationship …

被引用次数：3 相关文章所有 2 个版本

[PDF] acm.org

Is someone speaking? exploring long-term temporal features for audio-visual active speaker detection

R Tao, Z Pan, RK Das, X Qian, MZ Shou… - Proceedings of the 29th …, 2021 - dl.acm.org

Active speaker detection (ASD) seeks to detect who is speaking in a visual scene of one or
more speakers. The successful ASD depends on accurate interpretation of short-term and …

被引用次数：159 相关文章所有 5 个版本

高级搜索

QQ 群

Active speaker detection with audio-visual co-training

Cross-modal supervision for learning active speaker detection in video

Audio-visual activity guided cross-modal identity association for active speaker detection

Who's speaking? Audio-supervised classification of active speakers in video

Maas: Multi-modal assignation for active speaker detection

Unsupervised active speaker detection in media content using cross-modal information

Ava active speaker: An audio-visual dataset for active speaker detection

Active speakers in context

Learning spatial-temporal graphs for active speaker detection

Is someone speaking? exploring long-term temporal features for audio-visual active speaker detection

相关搜索

引用