PERSONA: A Reproducible Testbed for Pluralistic Alignment

L Castricato, N Lile, R Rafailov, JP Fränken… - arXiv preprint arXiv …, 2024 - arxiv.org
The rapid advancement of language models (LMs) necessitates robust alignment with
diverse user values. However, current preference optimization approaches often fail to …

Adaptagent: Adapting multimodal web agents with few-shot learning from human demonstrations

G Verma, R Kaur, N Srishankar, Z Zeng, T Balch… - arXiv preprint arXiv …, 2024 - arxiv.org
State-of-the-art multimodal web agents, powered by Multimodal Large Language Models
(MLLMs), can autonomously execute many web tasks by processing user instructions and …

ChainBuddy: An AI Agent System for Generating LLM Pipelines

J Zhang, I Arawjo - arXiv preprint arXiv:2409.13588, 2024 - arxiv.org
As large language models (LLMs) advance, their potential applications have grown
significantly. However, it remains difficult to evaluate LLM behavior on user-specific tasks …

Open-domain implicit format control for large language model generation

Y Yao, W Ma, X Fang, X Jiang, X Li, X Meng… - arXiv preprint arXiv …, 2024 - arxiv.org
Controlling the format of outputs generated by large language models (LLMs) is a critical
functionality in various applications. Current methods typically employ constrained decoding …

Aligning LLMs with Domain Invariant Reward Models

D Wu, S Choudhury - arXiv preprint arXiv:2501.00911, 2025 - arxiv.org
Aligning large language models (LLMs) to human preferences is challenging in domains
where preference data is unavailable. We address the problem of learning reward models …

SePPO: Semi-Policy Preference Optimization for Diffusion Alignment

D Zhang, G Lan, DJ Han, W Yao, X Pan… - arXiv preprint arXiv …, 2024 - arxiv.org
Reinforcement learning from human feedback (RLHF) methods are emerging as a way to
fine-tune diffusion models (DMs) for visual generation. However, commonly used on-policy …

No Preference Left Behind: Group Distributional Preference Optimization

B Yao, Z Cai, YS Chuang, S Yang, M Jiang… - arXiv preprint arXiv …, 2024 - arxiv.org
Preferences within a group of people are not uniform but follow a distribution. While existing
alignment methods like Direct Preference Optimization (DPO) attempt to steer models to …

Improving llm generation with inverse and forward alignment: Reward modeling, prompting, fine-tuning, and inference-time optimization

H Sun, T Pouplin, N Astorga, T Liu… - The First Workshop on … - openreview.net
Large Language Models (LLMs) are often characterized as samplers or generators in the
literature, yet maximizing their capabilities in these roles is a complex challenge. Previous …

Granting Non-AI Experts Creative Control Over AI Systems

MS Lam - Adjunct Proceedings of the 37th Annual ACM …, 2024 - dl.acm.org
Many harmful behaviors and problematic deployments of AI stem from the fact that AI experts
are not experts in the vast array of settings where AI is applied. Non-AI experts from these …

Superficial Alignment, Subtle Divergence, and Nudge Sensitivity in LLM Decision-Making

M Cherep, N Singh, P Maes - NeurIPS 2024 Workshop on Behavioral … - openreview.net
LLMs are being set loose in complex, real-world environments involving sequential decision-
making and tool use. Often, this involves making choices on behalf of human users. Not …