相关文章- 学术资源搜索

Cityllava: Efficient fine-tuning for vlms in city scenario

Z Duan, H Cheng, D Xu, X Wu… - Proceedings of the …, 2024 - openaccess.thecvf.com

In the vast and dynamic landscape of urban settings Traffic Safety Description and Analysis
plays a pivotal role in applications ranging from insurance inspection to accident prevention …

被引用次数：1 相关文章所有 3 个版本

[PDF] thecvf.com

Multi-perspective traffic video description model with fine-grained refinement approach

TA To, MN Tran, TB Ho, TL Ha… - Proceedings of the …, 2024 - openaccess.thecvf.com

The analysis of traffic patterns is crucial for enhancing safety and optimizing flow within
urban cities. While urban cities possess extensive camera networks for monitoring the raw …

被引用次数：1 相关文章

[PDF] thecvf.com

Divide and conquer boosting for enhanced traffic safety description and analysis with large vision language model

KT Xuan, KN Nguyen, BH Ngo… - Proceedings of the …, 2024 - openaccess.thecvf.com

The increasing complexity of traffic dynamics has underscored the necessity for advanced
traffic safety description and analysis challenging the efficacy of current methodologies in …

被引用次数：1 相关文章

[PDF] thecvf.com

Vila: On pre-training for visual language models

J Lin, H Yin, W Ping, P Molchanov… - Proceedings of the …, 2024 - openaccess.thecvf.com

Visual language models (VLMs) rapidly progressed with the recent success of large
language models. There have been growing efforts on visual instruction tuning to extend the …

被引用次数：55 相关文章所有 4 个版本

[PDF] thecvf.com

Trafficvlm: A controllable visual language model for traffic video captioning

QM Dinh, MK Ho, AQ Dang… - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com

Traffic video description and analysis have received much attention recently due to the
growing demand for efficient and reliable urban surveillance systems. Most existing methods …

被引用次数：1 相关文章所有 3 个版本

[PDF] thecvf.com

Regiongpt: Towards region understanding vision language model

Q Guo, S De Mello, H Yin, W Byeon… - Proceedings of the …, 2024 - openaccess.thecvf.com

Vision language models (VLMs) have experienced rapid advancements through the
integration of large language models (LLMs) with image-text pairs yet they struggle with …

被引用次数：5 相关文章所有 3 个版本

[PDF] thecvf.com

Lavender: Unifying video-language understanding as masked language modeling

L Li, Z Gan, K Lin, CC Lin, Z Liu… - Proceedings of the …, 2023 - openaccess.thecvf.com

Unified vision-language frameworks have greatly advanced in recent years, most of which
adopt an encoder-decoder architecture to unify image-text tasks as sequence-to-sequence …

被引用次数：68 相关文章所有 6 个版本

[PDF] arxiv.org

Multi-Frame, Lightweight & Efficient Vision-Language Models for Question Answering in Autonomous Driving

A Gopalkrishnan, R Greer, M Trivedi - arXiv preprint arXiv:2403.19838, 2024 - arxiv.org

Vision-Language Models (VLMs) and Multi-Modal Language models (MMLMs) have
become prominent in autonomous driving research, as these models can provide …

被引用次数：1 相关文章所有 3 个版本

[PDF] thecvf.com

Probing conceptual understanding of large visual-language models

M Schiappa, R Abdullah, S Azad… - Proceedings of the …, 2024 - openaccess.thecvf.com

In recent years large visual-language (V+ L) models have achieved great success in various
downstream tasks. However it is not well studied whether these models have a conceptual …

被引用次数：9 相关文章所有 5 个版本

[PDF] arxiv.org

ViCor: Bridging Visual Understanding and Commonsense Reasoning with Large Language Models

K Zhou, K Lee, T Misu, XE Wang - arXiv preprint arXiv:2310.05872, 2023 - arxiv.org

In our work, we explore the synergistic capabilities of pre-trained vision-and-language
models (VLMs) and large language models (LLMs) for visual commonsense reasoning …

被引用次数：1 相关文章所有 3 个版本

高级搜索

QQ 群

Cityllava: Efficient fine-tuning for vlms in city scenario

Multi-perspective traffic video description model with fine-grained refinement approach

Divide and conquer boosting for enhanced traffic safety description and analysis with large vision language model

Vila: On pre-training for visual language models

Trafficvlm: A controllable visual language model for traffic video captioning

Regiongpt: Towards region understanding vision language model

Lavender: Unifying video-language understanding as masked language modeling

Multi-Frame, Lightweight & Efficient Vision-Language Models for Question Answering in Autonomous Driving

Probing conceptual understanding of large visual-language models

ViCor: Bridging Visual Understanding and Commonsense Reasoning with Large Language Models

相关搜索

引用