Non-adaptive Online Finetuning for Offline Reinforcement Learning

A Huang, M Ghavamzadeh, N Jiang, M Petrik - NeurIPS 2023 Workshop … - openreview.net
NeurIPS 2023 Workshop on Generalization in Planningopenreview.net
Offline reinforcement learning (RL) has emerged as an important framework for applying RL
to real-life applications. However, the complete lack of online interactions causes technical
difficulties, and the _online finetuning_ setting incorporates a limited form of online
interactions---which is often available in practice---to address these challenges.
Unfortunately, current theoretical frameworks for online finetuning either assume high online
sample complexity and/or require deploying fully adaptive algorithms (ie, unlimited policy …
Offline reinforcement learning (RL) has emerged as an important framework for applying RL to real-life applications. However, the complete lack of online interactions causes technical difficulties, and the _online finetuning_ setting incorporates a limited form of online interactions---which is often available in practice---to address these challenges. Unfortunately, current theoretical frameworks for online finetuning either assume high online sample complexity and/or require deploying fully adaptive algorithms (i.e., unlimited policy changes), which restricts their application to real-world settings where online interactions and policy updates are expensive and limited. In this paper, we develop a new framework for online finetuning. Instead of competing with the optimal policy (which inherits the high sample complexity and adaptivity requirements of online RL), we aim to learn a new policy that improves as much as possible over the existing policy using a _pre-specified_ number of online samples and with a _non-adaptive_ data-collection policy. Our formulation reveals surprising nuances and suggests novel principles that distinguishes the finetuning problem from purely online and offline RL.
openreview.net
以上显示的是最相近的搜索结果。 查看全部搜索结果