Zelda: Video analytics using vision-language models

F Romero, C Winston, J Hauswald, M Zaharia… - arXiv preprint arXiv …, 2023 - arxiv.org
Advances in ML have motivated the design of video analytics systems that allow for
structured queries over video datasets. However, existing systems limit query expressivity …

[PDF][PDF] Vaas: video analytics at scale

F Bastani, O Moll, S Madden - 2020 - dspace.mit.edu
We demonstrate Vaas, a video analytics system for largescale datasets. Vaas provides an
interactive interface to rapidly develop and experiment with different workflows for solving a …

Fewer Tokens and Fewer Videos: Extending Video Understanding Abilities in Large Vision-Language Models

S Chen, Y Yuan, S Chen, Z Jie, L Ma - arXiv preprint arXiv:2406.08024, 2024 - arxiv.org
Amidst the advancements in image-based Large Vision-Language Models (image-LVLM),
the transition to video-based models (video-LVLM) is hindered by the limited availability of …

Panorama: a data system for unbounded vocabulary querying over video

Y Zhang, A Kumar - Proceedings of the VLDB Endowment, 2019 - dl.acm.org
Deep convolutional neural networks (CNNs) achieve state-of-the-art accuracy for many
computer vision tasks. But using them for video monitoring applications incurs high …

[PDF][PDF] Visflow: a relational platform for efficient large-scale video analytics

Y Lu, A Chowdhery, S Kandula - ACM Symposium on Cloud …, 2016 - microsoft.com
We describe VisFlow, a system that efficiently analyzes the feeds from many cameras.
Ubiquitous camera deployments are widely used for security, traffic monitoring, and …

Blazeit: Optimizing declarative aggregation and limit queries for neural network-based video analytics

D Kang, P Bailis, M Zaharia - arXiv preprint arXiv:1805.01046, 2018 - arxiv.org
Recent advances in neural networks (NNs) have enabled automatic querying of large
volumes of video data with high accuracy. While these deep NNs can produce accurate …

Optimizing video analytics with declarative model relationships

F Romero, J Hauswald, A Partap, D Kang… - Proceedings of the …, 2022 - dl.acm.org
The availability of vast video collections and the accuracy of ML models has generated
significant interest in video analytics systems. Since naively processing all frames using …

ViTA: An Efficient Video-to-Text Algorithm using VLM for RAG-based Video Analysis System

MA Arefeen, B Debnath, MYS Uddin… - Proceedings of the …, 2024 - openaccess.thecvf.com
Retrieval-augmented generation (RAG) is used in natural language processing (NLP) to
provide query-relevant information in enterprise documents to large language models …

Seiden: Revisiting query processing in video database systems

J Bang, GT Kakkar, P Chunduri, S Mitra… - Proceedings of the VLDB …, 2023 - par.nsf.gov
State-of-the-art video database management systems (VDBMSs) often use lightweight proxy
models to accelerate object retrieval and aggregate queries. The key assumption underlying …

Streaming long video understanding with large language models

R Qian, X Dong, P Zhang, Y Zang, S Ding, D Lin… - arXiv preprint arXiv …, 2024 - arxiv.org
This paper presents VideoStreaming, an advanced vision-language large model (VLLM) for
video understanding, that capably understands arbitrary-length video with a constant …