A survey on evaluation of multimodal large language models

J Huang, J Zhang - arXiv preprint arXiv:2408.15769, 2024 - arxiv.org
Multimodal Large Language Models (MLLMs) mimic human perception and reasoning
system by integrating powerful Large Language Models (LLMs) with various modality …

Benchmark evaluations, applications, and challenges of large vision language models: A survey

Z Li, X Wu, H Du, H Nghiem, G Shi - arXiv preprint arXiv:2501.02189, 2025 - arxiv.org
Multimodal Vision Language Models (VLMs) have emerged as a transformative technology
at the intersection of computer vision and natural language processing, enabling machines …

Can chatgpt detect deepfakes? a study of using multimodal large language models for media forensics

S Jia, R Lyu, K Zhao, Y Chen, Z Yan… - Proceedings of the …, 2024 - openaccess.thecvf.com
DeepFakes which refer to AI-generated media content have become an increasing concern
due to their use as a means for disinformation. Detecting DeepFakes is currently solved with …

Gm-df: Generalized multi-scenario deepfake detection

Y Lai, Z Yu, J Yang, B Li, X Kang, L Shen - arXiv preprint arXiv:2406.20078, 2024 - arxiv.org
Existing face forgery detection usually follows the paradigm of training models in a single
domain, which leads to limited generalization capacity when unseen scenarios and …

Can We Leave Deepfake Data Behind in Training Deepfake Detector?

J Cheng, Z Yan, Y Zhang, Y Luo, Z Wang… - arXiv preprint arXiv …, 2024 - arxiv.org
The generalization ability of deepfake detectors is vital for their applications in real-world
scenarios. One effective solution to enhance this ability is to train the models with manually …

A survey on multimodal benchmarks: In the era of large ai models

L Li, G Chen, H Shi, J Xiao, L Chen - arXiv preprint arXiv:2409.18142, 2024 - arxiv.org
The rapid evolution of Multimodal Large Language Models (MLLMs) has brought substantial
advancements in artificial intelligence, significantly enhancing the capability to understand …

Ffaa: Multimodal large language model based explainable open-world face forgery analysis assistant

Z Huang, B Xia, Z Lin, Z Mou, W Yang - arXiv preprint arXiv:2408.10072, 2024 - arxiv.org
The rapid advancement of deepfake technologies has sparked widespread public concern,
particularly as face forgery poses a serious threat to public information security. However …

Generalizing deepfake video detection with plug-and-play: Video-level blending and spatiotemporal adapter tuning

Z Yan, Y Zhao, S Chen, X Fu, T Yao, S Ding… - arXiv preprint arXiv …, 2024 - arxiv.org
Three key challenges hinder the development of current deepfake video detection:(1)
Temporal features can be complex and diverse: how can we identify general temporal …

Large Multimodal Agents for Accurate Phishing Detection with Enhanced Token Optimization and Cost Reduction

F Trad, A Chehab - arXiv preprint arXiv:2412.02301, 2024 - arxiv.org
With the rise of sophisticated phishing attacks, there is a growing need for effective and
economical detection solutions. This paper explores the use of large multimodal agents …

A Hitchhikers Guide to Fine-Grained Face Forgery Detection Using Common Sense Reasoning

NM Foteinopoulou, E Ghorbel, D Aouada - arXiv preprint arXiv …, 2024 - arxiv.org
Explainability in artificial intelligence is crucial for restoring trust, particularly in areas like
face forgery detection, where viewers often struggle to distinguish between real and …