Hierarchical contrast for unsupervised skeleton-based action representation learning

J Dong, S Sun, Z Liu, S Chen, B Liu… - Proceedings of the AAAI …, 2023 - ojs.aaai.org
This paper targets unsupervised skeleton-based action representation learning and
proposes a new Hierarchical Contrast (HiCo) framework. Different from the existing …

Audiovisual video summarization

B Zhao, M Gong, X Li - IEEE Transactions on Neural Networks …, 2021 - ieeexplore.ieee.org
Audio and vision are two main modalities in video data. Multimodal learning, especially for
audiovisual learning, has drawn considerable attention recently, which can boost the …

Learning dual-routing capsule graph neural network for few-shot video classification

Y Feng, J Gao, C Xu - IEEE Transactions on Multimedia, 2022 - ieeexplore.ieee.org
Few-shot video classification (video FSL), which learns classifiers for novel concepts, has
gained increasing attention in the last few years from only a few samples. The existing …

Steps: Self-supervised key step extraction and localization from unlabeled procedural videos

A Shah, B Lundell, H Sawhney… - Proceedings of the …, 2023 - openaccess.thecvf.com
We address the problem of extracting key steps from unlabeled procedural videos,
motivated by the potential of Augmented Reality (AR) headsets to revolutionize job training …

Condensing Video Content: Deep Learning Advancements and Challenges in Video Summarization Innovations

F Shamsi, I Sindhu - IEEE Access, 2025 - ieeexplore.ieee.org
With the rapid growth of social media platforms, the volume of video content on the internet
has increased exponentially. YouTube, the most popular social networking platform …

Spatiotemporal Orthogonal Projection Capsule Network for Incremental Few-Shot Action Recognition

Y Feng, J Gao, C Xu - IEEE Transactions on Multimedia, 2024 - ieeexplore.ieee.org
In this paper, we propose a new task named incremental few-shot action recognition
(IFSAR), which aims to learn new action classes incrementally with limited samples. Existing …

Spatial-temporal exclusive capsule network for open set action recognition

Y Feng, J Gao, S Yang, C Xu - IEEE Transactions on Multimedia, 2023 - ieeexplore.ieee.org
Open set action recognition (OSAR) is a rising research domain that simultaneously
identifies all videos from known classes and rejects videos from unknown classes. Existing …

Vinci: A Real-time Embodied Smart Assistant based on Egocentric Vision-Language Model

Y Huang, J Xu, B Pei, Y He, G Chen, L Yang… - arXiv preprint arXiv …, 2024 - arxiv.org
We introduce Vinci, a real-time embodied smart assistant built upon an egocentric vision-
language model. Designed for deployment on portable devices such as smartphones and …

Emotion knowledge driven video highlight detection

F Qi, X Yang, C Xu - IEEE Transactions on Multimedia, 2020 - ieeexplore.ieee.org
This paper addresses video highlight detection which aims to select a small subset of frames
according to user's major or special interest. The performances of conventional methods …

Learning scene-aware spatio-temporal GNNs for few-shot early action prediction

Y Hu, J Gao, C Xu - IEEE Transactions on Multimedia, 2022 - ieeexplore.ieee.org
We aim to address a new task named few-shot early action prediction (FS-EAP) that learns
classifiers for novel actions from only a few partially observed videos. We argue that the task …