Enhancing Large Vision Language Models with Self-Training on Image Comprehension

Y Deng, P Lu, F Yin, Z Hu, S Shen, J Zou… - arXiv preprint arXiv …, 2024 - arxiv.org
Large vision language models (LVLMs) integrate large language models (LLMs) with pre-
trained vision encoders, thereby activating the perception capability of the model to …

Aligning large language models via self-steering optimization

H Xiang, B Yu, H Lin, K Lu, Y Lu, X Han, L Sun… - arXiv preprint arXiv …, 2024 - arxiv.org
Automated alignment develops alignment systems with minimal human intervention. The
key to automated alignment lies in providing learnable and accurate preference signals for …

PERSONA: A Reproducible Testbed for Pluralistic Alignment

L Castricato, N Lile, R Rafailov, JP Fränken… - arXiv preprint arXiv …, 2024 - arxiv.org
The rapid advancement of language models (LMs) necessitates robust alignment with
diverse user values. However, current preference optimization approaches often fail to …

Towards Scalable Automated Alignment of LLMs: A Survey

B Cao, K Lu, X Lu, J Chen, M Ren, H Xiang… - arXiv preprint arXiv …, 2024 - arxiv.org
Alignment is the most critical step in building large language models (LLMs) that meet
human needs. With the rapid development of LLMs gradually surpassing human …

Is Free Self-Alignment Possible?

D Adila, C Shin, Y Zhang, F Sala - arXiv preprint arXiv:2406.03642, 2024 - arxiv.org
Aligning pretrained language models (LMs) is a complex and resource-intensive process,
often requiring access to large amounts of ground-truth preference data and substantial …

Can Language Models Safeguard Themselves, Instantly and For Free?

D Adila, C Shin, Y Zhang, F Sala - ICML 2024 Next Generation of AI Safety … - openreview.net
Aligning pretrained language models (LMs) to handle a new safety scenario is normally
difficult and expensive, often requiring access to large amounts of ground-truth preference …

[PDF][PDF] Generative Reward Models-A Unified Approach to RLHF and RLAIF

D Mahan, D Van Phung, R Rafailov, CBNLL Castricato - static.synthlabs.ai
Reinforcement Learning from Human Feedback (RLHF) has greatly improved the
performance of modern Large Language Models (LLMs). The RLHF process is resource …