Mm-llms: Recent advances in multimodal large language models

D Zhang, Y Yu, J Dong, C Li, D Su, C Chu… - arXiv preprint arXiv …, 2024 - arxiv.org
In the past year, MultiModal Large Language Models (MM-LLMs) have undergone
substantial advancements, augmenting off-the-shelf LLMs to support MM inputs or outputs …

Phi-3 technical report: A highly capable language model locally on your phone

M Abdin, J Aneja, H Awadalla, A Awadallah… - arXiv preprint arXiv …, 2024 - arxiv.org
We introduce phi-3-mini, a 3.8 billion parameter language model trained on 3.3 trillion
tokens, whose overall performance, as measured by both academic benchmarks and …

xgen-mm (blip-3): A family of open large multimodal models

L Xue, M Shu, A Awadalla, J Wang, A Yan… - arXiv preprint arXiv …, 2024 - arxiv.org
This report introduces xGen-MM (also known as BLIP-3), a framework for developing Large
Multimodal Models (LMMs). The framework comprises meticulously curated datasets, a …

Eyes closed, safety on: Protecting multimodal llms via image-to-text transformation

Y Gou, K Chen, Z Liu, L Hong, H Xu, Z Li… - … on Computer Vision, 2025 - Springer
Multimodal large language models (MLLMs) have shown impressive reasoning abilities.
However, they are also more vulnerable to jailbreak attacks than their LLM predecessors …

Jailbreakzoo: Survey, landscapes, and horizons in jailbreaking large language and vision-language models

H Jin, L Hu, X Li, P Zhang, C Chen, J Zhuang… - arXiv preprint arXiv …, 2024 - arxiv.org
The rapid evolution of artificial intelligence (AI) through developments in Large Language
Models (LLMs) and Vision-Language Models (VLMs) has brought significant advancements …

Lmeye: An interactive perception network for large language models

Y Li, B Hu, X Chen, L Ma, Y Xu… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
Current efficient approaches to building Multimodal Large Language Models (MLLMs)
mainly incorporate visual information into LLMs with a simple visual mapping network such …

Harmful fine-tuning attacks and defenses for large language models: A survey

T Huang, S Hu, F Ilhan, SF Tekin, L Liu - arXiv preprint arXiv:2409.18169, 2024 - arxiv.org
Recent research demonstrates that the nascent fine-tuning-as-a-service business model
exposes serious safety concerns--fine-tuning over a few harmful data uploaded by the users …

Self-supervised visual preference alignment

K Zhu, L Zhao, Z Ge, X Zhang - Proceedings of the 32nd ACM …, 2024 - dl.acm.org
This paper makes the first attempt towards unsupervised preference alignment in Vision-
Language Models (VLMs). We generate chosen and rejected responses with regard to the …

Jailbreaking and mitigation of vulnerabilities in large language models

B Peng, Z Bi, Q Niu, M Liu, P Feng, T Wang… - arXiv preprint arXiv …, 2024 - arxiv.org
Large Language Models (LLMs) have transformed artificial intelligence by advancing
natural language understanding and generation, enabling applications across fields beyond …

[PDF][PDF] Lazy safety alignment for large language models against harmful fine-tuning

T Huang, S Hu, F Ilhan, SF Tekin… - arXiv preprint arXiv …, 2024 - openreview.net
Recent studies show that Large Language Models (LLMs) with safety alignment can be jail-
broken by fine-tuning on a dataset mixed with harmful data. First time in the literature, we …