Mm-llms: Recent advances in multimodal large language models

D Zhang, Y Yu, C Li, J Dong, D Su, C Chu… - arXiv preprint arXiv …, 2024 - arxiv.org
In the past year, MultiModal Large Language Models (MM-LLMs) have undergone
substantial advancements, augmenting off-the-shelf LLMs to support MM inputs or outputs …

Combating misinformation in the age of llms: Opportunities and challenges

C Chen, K Shu - arXiv preprint arXiv:2311.05656, 2023 - arxiv.org
Misinformation such as fake news and rumors is a serious threat on information ecosystems
and public trust. The emergence of Large Language Models (LLMs) has great potential to …

A survey of large language models

WX Zhao, K Zhou, J Li, T Tang, X Wang, Y Hou… - arXiv preprint arXiv …, 2023 - arxiv.org
Language is essentially a complex, intricate system of human expressions governed by
grammatical rules. It poses a significant challenge to develop capable AI algorithms for …

Vila: On pre-training for visual language models

J Lin, H Yin, W Ping, P Molchanov… - Proceedings of the …, 2024 - openaccess.thecvf.com
Visual language models (VLMs) rapidly progressed with the recent success of large
language models. There have been growing efforts on visual instruction tuning to extend the …

Sharegpt4v: Improving large multi-modal models with better captions

L Chen, J Li, X Dong, P Zhang, C He, J Wang… - arXiv preprint arXiv …, 2023 - arxiv.org
In the realm of large multi-modal models (LMMs), efficient modality alignment is crucial yet
often constrained by the scarcity of high-quality image-text data. To address this bottleneck …

Internvl: Scaling up vision foundation models and aligning for generic visual-linguistic tasks

Z Chen, J Wu, W Wang, W Su, G Chen… - Proceedings of the …, 2024 - openaccess.thecvf.com
The exponential growth of large language models (LLMs) has opened up numerous
possibilities for multi-modal AGI systems. However the progress in vision and vision …

Mitigating object hallucinations in large vision-language models through visual contrastive decoding

S Leng, H Zhang, G Chen, X Li, S Lu… - Proceedings of the …, 2024 - openaccess.thecvf.com
Abstract Large Vision-Language Models (LVLMs) have advanced considerably intertwining
visual recognition and language understanding to generate content that is not only coherent …

Benchmarking large language models in retrieval-augmented generation

J Chen, H Lin, X Han, L Sun - Proceedings of the AAAI Conference on …, 2024 - ojs.aaai.org
Retrieval-Augmented Generation (RAG) is a promising approach for mitigating the
hallucination of large language models (LLMs). However, existing research lacks rigorous …

Large language models for information retrieval: A survey

Y Zhu, H Yuan, S Wang, J Liu, W Liu, C Deng… - arXiv preprint arXiv …, 2023 - arxiv.org
As a primary means of information acquisition, information retrieval (IR) systems, such as
search engines, have integrated themselves into our daily lives. These systems also serve …

The 8th AI City Challenge

S Wang, DC Anastasiu, Z Tang… - Proceedings of the …, 2024 - openaccess.thecvf.com
Abstract The eighth AI City Challenge highlighted the convergence of computer vision and
artificial intelligence in areas like retail warehouse settings and Intelligent Traffic Systems …