- 学术资源搜索

Learning to generate better than your llm

JD Chang, K Brantley, R Ramamurthy, D Misra… - arXiv preprint arXiv …, 2023 - arxiv.org

Reinforcement learning (RL) has emerged as a powerful paradigm for fine-tuning Large
Language Models (LLMs) for conditional text generation. In particular, recent LLMs such as …

被引用次数：24 相关文章所有 3 个版本

[PDF] ed.ac.uk

Learning natural locomotion behaviors for humanoid robots using human bias

C Yang, K Yuan, S Heng, T Komura… - IEEE Robotics and …, 2020 - ieeexplore.ieee.org

This letter presents a new learning framework that leverages the knowledge from imitation
learning, deep reinforcement learning, and control theories to achieve human-style …

被引用次数：50 相关文章所有 8 个版本

[PDF] arxiv.org

Sampling-based exploration for reinforcement learning of dexterous manipulation

G Khandate, S Shang, ET Chang, TL Saidi… - arXiv preprint arXiv …, 2023 - arxiv.org

In this paper, we present a novel method for achieving dexterous manipulation of complex
objects, while simultaneously securing the object without the use of passive support …

被引用次数：20 相关文章所有 6 个版本

[PDF] arxiv.org

ACDER: Augmented curiosity-driven experience replay

B Li, T Lu, J Li, N Lu, Y Cai… - 2020 IEEE International …, 2020 - ieeexplore.ieee.org

Exploration in environments with sparse feed-back remains a challenging research problem
in reinforcement learning (RL). When the RL agent explores the environment randomly, it …

被引用次数：21 相关文章所有 3 个版本

[PDF] arxiv.org

Targeted search control in AlphaZero for effective policy improvement

A Trudeau, M Bowling - arXiv preprint arXiv:2302.12359, 2023 - arxiv.org

AlphaZero is a self-play reinforcement learning algorithm that achieves superhuman play in
chess, shogi, and Go via policy iteration. To be an effective policy improvement operator …

被引用次数：2 相关文章所有 7 个版本

[PDF] arxiv.org

RR: Rapid eXploration for Reinforcement Learning via Sampling-based Reset Distributions and Imitation Pre-training

G Khandate, TL Saidi, S Shang, ET Chang… - arXiv preprint arXiv …, 2024 - arxiv.org

We present a method for enabling Reinforcement Learning of motor control policies for
complex skills such as dexterous manipulation. We posit that a key difficulty for training such …

Where2Start: Leveraging initial States for Robust and Sample-Efficient Reinforcement Learning

P Parsa, RZ Moayedi, M Bornosi, MM Bejani - arXiv preprint arXiv …, 2023 - arxiv.org

The reinforcement learning algorithms that focus on how to compute the gradient and
choose next actions, are effectively improved the performance of the agents. However, these …

Towards Specialized Reinforcement Learning From Diverse Data

JD Chang - 2024 - search.proquest.com

Reinforcement learning (RL) fundamentally focuses on teaching agents how to make
decisions by interacting with an environment. Unlike supervised learning approaches that …

[PDF] core.ac.uk

[PDF][PDF] Real-Time Hybrid Visual Servoing of a Redundant Manipulator via Deep Reinforcement Learning

AJ Williams - 2023 - core.ac.uk

Fixtureless assembly may be necessary in some manufacturing tasks and environments due
to various constraints but poses challenges for automation due to nondeterministic …

Learning to Generate Better than your Large Language Models

JD Chang, K Brantley, R Ramamurthy, D Misra, W Sun - openreview.net

Reinforcement learning (RL) has emerged as a powerful paradigm for fine-tuning Large
Language Models (LLMs) for text generation. In particular, recent LLMs such as ChatGPT …

高级搜索

QQ 群

Learning to generate better than your llm

Learning natural locomotion behaviors for humanoid robots using human bias

Sampling-based exploration for reinforcement learning of dexterous manipulation

ACDER: Augmented curiosity-driven experience replay

Targeted search control in AlphaZero for effective policy improvement

RR: Rapid eXploration for Reinforcement Learning via Sampling-based Reset Distributions and Imitation Pre-training

Where2Start: Leveraging initial States for Robust and Sample-Efficient Reinforcement Learning

Towards Specialized Reinforcement Learning From Diverse Data

[PDF][PDF] Real-Time Hybrid Visual Servoing of a Redundant Manipulator via Deep Reinforcement Learning

Learning to Generate Better than your Large Language Models

引用