Ego-exo4d: Understanding skilled human activity from first-and third-person perspectives

K Grauman, A Westbury, L Torresani… - Proceedings of the …, 2024 - openaccess.thecvf.com
Abstract We present Ego-Exo4D a diverse large-scale multimodal multiview video dataset
and benchmark challenge. Ego-Exo4D centers around simultaneously-captured egocentric …

Active Speaker Detection using Audio, Visual and Depth Modalities: A Survey

SNAM Robi, MAZM Ariffin, MAM Izhar, N Ahmad… - IEEE …, 2024 - ieeexplore.ieee.org
The rapid progress of multimodal signal processing in recent years has cleared the way for
novel applications in human-computer interaction, surveillance, and telecommunication …

Target active speaker detection with audio-visual cues

Y Jiang, R Tao, Z Pan, H Li - arXiv preprint arXiv:2305.12831, 2023 - arxiv.org
In active speaker detection (ASD), we would like to detect whether an on-screen person is
speaking based on audio-visual cues. Previous studies have primarily focused on modeling …

Loconet: Long-short context network for active speaker detection

X Wang, F Cheng, G Bertasius - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com
Abstract Active Speaker Detection (ASD) aims to identify who is speaking in each frame of a
video. Solving ASD involves using audio and visual information in two complementary …

MMAL: Multi-Modal Analytic Learning for Exemplar-Free Audio-Visual Class Incremental Tasks

X Yue, X Zhang, Y Chen, C Zhang, M Lao… - Proceedings of the …, 2024 - dl.acm.org
Class-incremental learning poses a significant challenge under an exemplar-free constraint,
leading to catastrophic forgetting and sub-par incremental accuracy. Previous attempts have …

A Lightweight Unsupervised Intrusion Detection Model Based on Variational Auto-Encoder

Y Ren, K Feng, F Hu, L Chen, Y Chen - Sensors, 2023 - mdpi.com
With the gradual integration of internet technology and the industrial control field, industrial
control systems (ICSs) have begun to access public networks on a large scale. Attackers use …

Joint audio-visual idling vehicle detection with streamlined input dependencies

X Li, R Mohammed, T Mangin, S Saha… - arXiv preprint arXiv …, 2024 - arxiv.org
Idling vehicle detection (IVD) can be helpful in monitoring and reducing unnecessary idling
and can be integrated into real-time systems to address the resulting pollution and harmful …

BIAS: A Body-based Interpretable Active Speaker Approach

T Roxo, JC Costa, PRM Inácio… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
State-of-the-art Active Speaker Detection (ASD) approaches heavily rely on audio and facial
features to perform, which is not a sustainable approach in wild scenarios. Although these …

Fish behavior recognition based on an audio-visual multimodal interactive fusion network

Y Yang, H Yu, X Zhang, P Zhang, W Tu, L Gu - Aquacultural Engineering, 2024 - Elsevier
In light of the challenges imposed by fish behavior recognition, which arise from
environmental noise and dim lighting in aquaculture environments and adversely affect the …

A real-time active speaker detection system integrating an audio-visual signal with a spatial querying mechanism

I Gurvich, I Leichter, DR Palle, Y Asher… - ICASSP 2024-2024 …, 2024 - ieeexplore.ieee.org
We introduce a distinctive real-time, causal, neural network-based active speaker detection
system optimized for low-power edge computing. This system drives a virtual …