Multi-perspective traffic video description model with fine-grained refinement approach

TA To, MN Tran, TB Ho, TL Ha… - Proceedings of the …, 2024 - openaccess.thecvf.com
The analysis of traffic patterns is crucial for enhancing safety and optimizing flow within
urban cities. While urban cities possess extensive camera networks for monitoring the raw …

Videoprism: A foundational visual encoder for video understanding

L Zhao, NB Gundavarapu, L Yuan, H Zhou… - arXiv preprint arXiv …, 2024 - arxiv.org
We introduce VideoPrism, a general-purpose video encoder that tackles diverse video
understanding tasks with a single frozen model. We pretrain VideoPrism on a …

Revisiting feature prediction for learning visual representations from video

A Bardes, Q Garrido, J Ponce, X Chen… - arXiv preprint arXiv …, 2024 - arxiv.org
This paper explores feature prediction as a stand-alone objective for unsupervised learning
from video and introduces V-JEPA, a collection of vision models trained solely using a …

V-jepa: Latent video prediction for visual representation learning

A Bardes, Q Garrido, J Ponce, X Chen, M Rabbat… - 2023 - openreview.net
This paper shows that the masked-modelling principle driving the success of large
foundational language models can be effectively applied to video by making predictions in …

VideoEval: Comprehensive Benchmark Suite for Low-Cost Evaluation of Video Foundation Model

X Li, Z Huang, J Wang, K Li, L Wang - arXiv preprint arXiv:2407.06491, 2024 - arxiv.org
With the growth of high-quality data and advancement in visual pre-training paradigms,
Video Foundation Models (VFMs) have made significant progress recently, demonstrating …

Video-Language Understanding: A Survey from Model Architecture, Model Training, and Data Perspectives

T Nguyen, Y Bin, J Xiao, L Qu, Y Li, JZ Wu… - arXiv preprint arXiv …, 2024 - arxiv.org
Humans use multiple senses to comprehend the environment. Vision and language are two
of the most vital senses since they allow us to easily communicate our thoughts and …

Foundation Models for Video Understanding: A Survey

N Madan, A Møgelmose, R Modi, YS Rawat… - arXiv preprint arXiv …, 2024 - arxiv.org
Video Foundation Models (ViFMs) aim to learn a general-purpose representation for various
video understanding tasks. Leveraging large-scale datasets and powerful models, ViFMs …

General Purpose and Interactive Video Analytics

FAR Llamas - 2023 - search.proquest.com
The proliferation of video collections and the increased capabilities of machine learning
models have led to a growing desire for video analytics—the process of extracting insights …

[PDF][PDF] LVS: A Learned Video Storage for Fast and Efficient Video Understanding

Y Lee, J Park - jongse-park.github.io
As video understanding (VU) promises unprecedented capabilities in the era of video data
explosion, its efficient computation plays a critical role in practicalizing the algorithmic …