Non-adaptive Online Finetuning for Offline Reinforcement Learning- 学术资源搜索

Non-adaptive Online Finetuning for Offline Reinforcement Learning

A Huang, M Ghavamzadeh, N Jiang, M Petrik - NeurIPS 2023 Workshop … - openreview.net

A Huang, M Ghavamzadeh, N Jiang, M Petrik

NeurIPS 2023 Workshop on Generalization in Planning•openreview.net

Offline reinforcement learning (RL) has emerged as an important framework for applying RL to real-life applications. However, the complete lack of online interactions causes technical difficulties, and the _online finetuning_ setting incorporates a limited form of online interactions---which is often available in practice---to address these challenges. Unfortunately, current theoretical frameworks for online finetuning either assume high online sample complexity and/or require deploying fully adaptive algorithms (i.e., unlimited policy changes), which restricts their application to real-world settings where online interactions and policy updates are expensive and limited. In this paper, we develop a new framework for online finetuning. Instead of competing with the optimal policy (which inherits the high sample complexity and adaptivity requirements of online RL), we aim to learn a new policy that improves as much as possible over the existing policy using a _pre-specified_ number of online samples and with a _non-adaptive_ data-collection policy. Our formulation reveals surprising nuances and suggests novel principles that distinguishes the finetuning problem from purely online and offline RL.

openreview.net

展开收起

被引用次数：2 相关文章

以上显示的是最相近的搜索结果。查看全部搜索结果

高级搜索

QQ 群

Non-adaptive Online Finetuning for Offline Reinforcement Learning

引用