[HTML][HTML] Image Analysis in Autonomous Vehicles: A Review of the Latest AI Solutions and Their Comparison

M Kozłowski, S Racewicz, S Wierzbicki - Applied Sciences, 2024 - mdpi.com
The integration of advanced image analysis using artificial intelligence (AI) is pivotal for the
evolution of autonomous vehicles (AVs). This article provides a thorough review of the most …

LLaVA-o1: Let Vision Language Models Reason Step-by-Step

G Xu, P Jin, L Hao, Y Song, L Sun, L Yuan - arXiv preprint arXiv …, 2024 - arxiv.org
Large language models have demonstrated substantial advancements in reasoning
capabilities, particularly through inference-time scaling, as illustrated by models such as …

VideoINSTA: Zero-shot Long Video Understanding via Informative Spatial-Temporal Reasoning with LLMs

R Liao, M Erler, H Wang, G Zhai, G Zhang, Y Ma… - arXiv preprint arXiv …, 2024 - arxiv.org
In the video-language domain, recent works in leveraging zero-shot Large Language Model-
based reasoning for video understanding have become competitive challengers to previous …

Video Question Answering: A survey of the state-of-the-art

PJ Jeshmol, BC Kovoor - Journal of Visual Communication and Image …, 2024 - Elsevier
Abstract Video Question Answering (VideoQA) emerges as a prominent trend in the domain
of Artificial Intelligence, Computer Vision, and Natural Language Processing. It involves …

A simple yet effective knowledge guided method for entity-aware video captioning on a basketball benchmark

Z Xi, G Shi, X Li, J Yan, Z Li, L Wu, Z Liu, L Wang - Neurocomputing, 2025 - Elsevier
Despite the recent emergence of video captioning models, how to generate the text
description with specific entity names and fine-grained actions is far from being solved …

Neuro-Symbolic Evaluation of Text-to-Video Models using Formalf Verification

SP Sharan, M Choi, S Shah, H Goel, M Omama… - arXiv preprint arXiv …, 2024 - arxiv.org
Recent advancements in text-to-video models such as Sora, Gen-3, MovieGen, and
CogVideoX are pushing the boundaries of synthetic video generation, with adoption seen in …