State-of-the-art multimodal web agents, powered by Multimodal Large Language Models (MLLMs), can autonomously execute many web tasks by processing user instructions and …
As large language models (LLMs) advance, their potential applications have grown significantly. However, it remains difficult to evaluate LLM behavior on user-specific tasks …
Controlling the format of outputs generated by large language models (LLMs) is a critical functionality in various applications. Current methods typically employ constrained decoding …
D Wu, S Choudhury - arXiv preprint arXiv:2501.00911, 2025 - arxiv.org
Aligning large language models (LLMs) to human preferences is challenging in domains where preference data is unavailable. We address the problem of learning reward models …
Reinforcement learning from human feedback (RLHF) methods are emerging as a way to fine-tune diffusion models (DMs) for visual generation. However, commonly used on-policy …
Preferences within a group of people are not uniform but follow a distribution. While existing alignment methods like Direct Preference Optimization (DPO) attempt to steer models to …
Large Language Models (LLMs) are often characterized as samplers or generators in the literature, yet maximizing their capabilities in these roles is a complex challenge. Previous …
MS Lam - Adjunct Proceedings of the 37th Annual ACM …, 2024 - dl.acm.org
Many harmful behaviors and problematic deployments of AI stem from the fact that AI experts are not experts in the vast array of settings where AI is applied. Non-AI experts from these …
LLMs are being set loose in complex, real-world environments involving sequential decision- making and tool use. Often, this involves making choices on behalf of human users. Not …