Self-supervised alignment with mutual information: Learning to follow principles without...

Y Deng, P Lu, F Yin, Z Hu, S Shen, J Zou… - arXiv preprint arXiv …, 2024 - arxiv.org

Large vision language models (LVLMs) integrate large language models (LLMs) with pre-
trained vision encoders, thereby activating the perception capability of the model to …

被引用次数：19 相关文章所有 2 个版本

[PDF] arxiv.org

Aligning large language models via self-steering optimization

H Xiang, B Yu, H Lin, K Lu, Y Lu, X Han, L Sun… - arXiv preprint arXiv …, 2024 - arxiv.org

Automated alignment develops alignment systems with minimal human intervention. The
key to automated alignment lies in providing learnable and accurate preference signals for …

被引用次数：1 相关文章所有 3 个版本

[PDF] arxiv.org

PERSONA: A Reproducible Testbed for Pluralistic Alignment

L Castricato, N Lile, R Rafailov, JP Fränken… - arXiv preprint arXiv …, 2024 - arxiv.org

The rapid advancement of language models (LMs) necessitates robust alignment with
diverse user values. However, current preference optimization approaches often fail to …

被引用次数：1 相关文章所有 2 个版本

[PDF] arxiv.org

Towards Scalable Automated Alignment of LLMs: A Survey

B Cao, K Lu, X Lu, J Chen, M Ren, H Xiang… - arXiv preprint arXiv …, 2024 - arxiv.org

Alignment is the most critical step in building large language models (LLMs) that meet
human needs. With the rapid development of LLMs gradually surpassing human …

被引用次数：10 相关文章所有 2 个版本

[PDF] arxiv.org

Is Free Self-Alignment Possible?

D Adila, C Shin, Y Zhang, F Sala - arXiv preprint arXiv:2406.03642, 2024 - arxiv.org

Aligning pretrained language models (LMs) is a complex and resource-intensive process,
often requiring access to large amounts of ground-truth preference data and substantial …

被引用次数：1 相关文章所有 2 个版本

[PDF] openreview.net

Can Language Models Safeguard Themselves, Instantly and For Free?

D Adila, C Shin, Y Zhang, F Sala - ICML 2024 Next Generation of AI Safety … - openreview.net

Aligning pretrained language models (LMs) to handle a new safety scenario is normally
difficult and expensive, often requiring access to large amounts of ground-truth preference …

[PDF] synthlabs.ai

[PDF][PDF] Generative Reward Models-A Unified Approach to RLHF and RLAIF

D Mahan, D Van Phung, R Rafailov, CBNLL Castricato - static.synthlabs.ai

Reinforcement Learning from Human Feedback (RLHF) has greatly improved the
performance of modern Large Language Models (LLMs). The RLHF process is resource …

高级搜索

QQ 群