- 学术资源搜索

Active observing in continuous-time control

S Holt, A Hüyük… - Advances in Neural …, 2024 - proceedings.neurips.cc

The control of continuous-time environments while actively deciding when to take costly
observations in time is a crucial yet unexplored problem, particularly relevant to real-world …

被引用次数：7 相关文章所有 5 个版本

[PDF] openreview.net

Query-dependent prompt evaluation and optimization with offline inverse RL

H Sun, A Hüyük, M van der Schaar - The Twelfth International …, 2023 - openreview.net

In this study, we aim to enhance the arithmetic reasoning ability of Large Language Models
(LLMs) through zero-shot prompt optimization. We identify a previously overlooked objective …

被引用次数：23 相关文章所有 3 个版本

[PDF] neurips.cc

What is flagged in uncertainty quantification? latent density models for uncertainty categorization

H Sun, B van Breugel, J Crabbé… - Advances in …, 2023 - proceedings.neurips.cc

Uncertainty quantification (UQ) is essential for creating trustworthy machine learning
models. Recent years have seen a steep rise in UQ methods that can flag suspicious …

被引用次数：6 相关文章所有 3 个版本

[PDF] arxiv.org

Towards robust offline reinforcement learning under diverse data corruption

R Yang, H Zhong, J Xu, A Zhang, C Zhang… - arXiv preprint arXiv …, 2023 - arxiv.org

Offline reinforcement learning (RL) presents a promising approach for learning reinforced
policies from offline datasets without the need for costly or unsafe interactions with the …

被引用次数：13 相关文章所有 3 个版本

[PDF] arxiv.org

Reinforcement learning in the era of llms: What is essential? what is needed? an rl perspective on rlhf, prompting, and beyond

H Sun - arXiv preprint arXiv:2310.06147, 2023 - arxiv.org

Recent advancements in Large Language Models (LLMs) have garnered wide attention and
led to successful products such as ChatGPT and GPT-4. Their proficiency in adhering to …

被引用次数：15 相关文章所有 2 个版本

[PDF] arxiv.org

Inverse-RLignment: Inverse Reinforcement Learning from Demonstrations for LLM Alignment

H Sun, M van der Schaar - arXiv preprint arXiv:2405.15624, 2024 - arxiv.org

Aligning Large Language Models (LLMs) is crucial for enhancing their safety and utility.
However, existing methods, primarily based on preference datasets, face challenges such …

被引用次数：7 相关文章所有 2 个版本

[PDF] researchgate.net

[PDF][PDF] What is Flagged in Uncertainty Quantification? Latent Density Models for Uncertainty Categorization

H Sun, B van Breugel, J Crabbe, N Seedat… - arXiv preprint arXiv …, 2022 - researchgate.net

Uncertainty quantification (UQ) is essential for creating trustworthy machine learning
models. Recent years have seen a steep rise in UQ methods that can flag suspicious …

被引用次数：7 相关文章所有 2 个版本

[PDF] arxiv.org

Supervised Fine-Tuning as Inverse Reinforcement Learning

H Sun - arXiv preprint arXiv:2403.12017, 2024 - arxiv.org

The prevailing approach to aligning Large Language Models (LLMs) typically relies on
human or AI feedback and assumes access to specific types of preference datasets. In our …

被引用次数：7 相关文章所有 2 个版本

[PDF] arxiv.org

Robust Decision Transformer: Tackling Data Corruption in Offline RL via Sequence Modeling

J Xu, R Yang, F Luo, M Fang, B Wang… - arXiv preprint arXiv …, 2024 - arxiv.org

Learning policies from offline datasets through offline reinforcement learning (RL) holds
promise for scaling data-driven decision-making and avoiding unsafe and costly online …

Defining Expertise: Applications to Treatment Effect Estimation

A Hüyük, Q Wei, A Curth, M van der Schaar - arXiv preprint arXiv …, 2024 - arxiv.org

Decision-makers are often experts of their domain and take actions based on their domain
knowledge. Doctors, for instance, may prescribe treatments by predicting the likely outcome …

高级搜索

QQ 群

Active observing in continuous-time control

Query-dependent prompt evaluation and optimization with offline inverse RL

What is flagged in uncertainty quantification? latent density models for uncertainty categorization

Towards robust offline reinforcement learning under diverse data corruption

Reinforcement learning in the era of llms: What is essential? what is needed? an rl perspective on rlhf, prompting, and beyond

Inverse-RLignment: Inverse Reinforcement Learning from Demonstrations for LLM Alignment

[PDF][PDF] What is Flagged in Uncertainty Quantification? Latent Density Models for Uncertainty Categorization

Supervised Fine-Tuning as Inverse Reinforcement Learning

Robust Decision Transformer: Tackling Data Corruption in Offline RL via Sequence Modeling

Defining Expertise: Applications to Treatment Effect Estimation

引用