Active observing in continuous-time control

S Holt, A Hüyük… - Advances in Neural …, 2024 - proceedings.neurips.cc
The control of continuous-time environments while actively deciding when to take costly
observations in time is a crucial yet unexplored problem, particularly relevant to real-world …

Query-dependent prompt evaluation and optimization with offline inverse RL

H Sun, A Hüyük, M van der Schaar - The Twelfth International …, 2023 - openreview.net
In this study, we aim to enhance the arithmetic reasoning ability of Large Language Models
(LLMs) through zero-shot prompt optimization. We identify a previously overlooked objective …

What is flagged in uncertainty quantification? latent density models for uncertainty categorization

H Sun, B van Breugel, J Crabbé… - Advances in …, 2023 - proceedings.neurips.cc
Uncertainty quantification (UQ) is essential for creating trustworthy machine learning
models. Recent years have seen a steep rise in UQ methods that can flag suspicious …

Towards robust offline reinforcement learning under diverse data corruption

R Yang, H Zhong, J Xu, A Zhang, C Zhang… - arXiv preprint arXiv …, 2023 - arxiv.org
Offline reinforcement learning (RL) presents a promising approach for learning reinforced
policies from offline datasets without the need for costly or unsafe interactions with the …

Reinforcement learning in the era of llms: What is essential? what is needed? an rl perspective on rlhf, prompting, and beyond

H Sun - arXiv preprint arXiv:2310.06147, 2023 - arxiv.org
Recent advancements in Large Language Models (LLMs) have garnered wide attention and
led to successful products such as ChatGPT and GPT-4. Their proficiency in adhering to …

Inverse-RLignment: Inverse Reinforcement Learning from Demonstrations for LLM Alignment

H Sun, M van der Schaar - arXiv preprint arXiv:2405.15624, 2024 - arxiv.org
Aligning Large Language Models (LLMs) is crucial for enhancing their safety and utility.
However, existing methods, primarily based on preference datasets, face challenges such …

[PDF][PDF] What is Flagged in Uncertainty Quantification? Latent Density Models for Uncertainty Categorization

H Sun, B van Breugel, J Crabbe, N Seedat… - arXiv preprint arXiv …, 2022 - researchgate.net
Uncertainty quantification (UQ) is essential for creating trustworthy machine learning
models. Recent years have seen a steep rise in UQ methods that can flag suspicious …

Supervised Fine-Tuning as Inverse Reinforcement Learning

H Sun - arXiv preprint arXiv:2403.12017, 2024 - arxiv.org
The prevailing approach to aligning Large Language Models (LLMs) typically relies on
human or AI feedback and assumes access to specific types of preference datasets. In our …

Robust Decision Transformer: Tackling Data Corruption in Offline RL via Sequence Modeling

J Xu, R Yang, F Luo, M Fang, B Wang… - arXiv preprint arXiv …, 2024 - arxiv.org
Learning policies from offline datasets through offline reinforcement learning (RL) holds
promise for scaling data-driven decision-making and avoiding unsafe and costly online …

Defining Expertise: Applications to Treatment Effect Estimation

A Hüyük, Q Wei, A Curth, M van der Schaar - arXiv preprint arXiv …, 2024 - arxiv.org
Decision-makers are often experts of their domain and take actions based on their domain
knowledge. Doctors, for instance, may prescribe treatments by predicting the likely outcome …