Active preference-based learning of reward functions

S Cabi, SG Colmenarejo, A Novikov… - arXiv preprint arXiv …, 2019 - arxiv.org

We present a framework for data-driven robotics that makes use of a large dataset of
recorded robot experience and scales to several tasks using learned reward functions. We …

被引用次数：136 相关文章所有 2 个版本

[PDF] arxiv.org

Asking easy questions: A user-friendly approach to active reward learning

E Bıyık, M Palan, NC Landolfi, DP Losey… - arXiv preprint arXiv …, 2019 - arxiv.org

Robots can learn the right reward function by querying a human expert. Existing approaches
attempt to choose questions where the robot is most uncertain about the human's response; …

被引用次数：117 相关文章所有 8 个版本

[PDF] aaai.org

Machine teaching for inverse reinforcement learning: Algorithms and applications

DS Brown, S Niekum - Proceedings of the AAAI Conference on Artificial …, 2019 - ojs.aaai.org

Inverse reinforcement learning (IRL) infers a reward function from demonstrations, allowing
for policy improvement and generalization. However, despite much recent interest in IRL …

被引用次数：93 相关文章所有 12 个版本

[PDF] arxiv.org

Interactive teaching algorithms for inverse reinforcement learning

P Kamalaruban, R Devidze, V Cevher… - arXiv preprint arXiv …, 2019 - arxiv.org

We study the problem of inverse reinforcement learning (IRL) with the added twist that the
learner is assisted by a helpful teacher. More formally, we tackle the following algorithmic …

被引用次数：73 相关文章所有 9 个版本

[PDF] acm.org

Incomplete contracting and AI alignment

D Hadfield-Menell, GK Hadfield - Proceedings of the 2019 AAAI/ACM …, 2019 - dl.acm.org

We suggest that the analysis of incomplete contracting developed by law and economics
researchers can provide a useful framework for understanding the AI alignment problem and …

被引用次数：78 相关文章所有 12 个版本

[PDF] arxiv.org

Preferences implicit in the state of the world

R Shah, D Krasheninnikov, J Alexander… - arXiv preprint arXiv …, 2019 - arxiv.org

Reinforcement learning (RL) agents optimize only the features specified in a reward function
and are indifferent to anything left out inadvertently. This means that we must not only …

被引用次数：68 相关文章所有 6 个版本

[PDF] arxiv.org

Batch active learning using determinantal point processes

E Bıyık, K Wang, N Anari, D Sadigh - arXiv preprint arXiv:1906.07975, 2019 - arxiv.org

Data collection and labeling is one of the main challenges in employing machine learning
algorithms in a variety of real-world applications with limited data. While active learning …

被引用次数：46 相关文章所有 5 个版本

[PDF] nsf.gov

Active learning of reward dynamics from hierarchical queries

C Basu, E Bıyık, Z He, M Singhal… - 2019 IEEE/RSJ …, 2019 - ieeexplore.ieee.org

Enabling robots to act according to human preferences across diverse environments is a
crucial task, extensively studied by both roboticists and machine learning researchers. To …

被引用次数：39 相关文章所有 7 个版本

[PDF] arxiv.org

The green choice: Learning and influencing human decisions on shared roads

E Bıyık, DA Lazar, D Sadigh… - 2019 IEEE 58th …, 2019 - ieeexplore.ieee.org

Autonomous vehicles have the potential to increase the capacity of roads via platooning,
even when human drivers and autonomous vehicles share roads. However, when users of a …

被引用次数：34 相关文章所有 10 个版本

[PDF] escholarship.org

Verifying robustness of human-aware autonomous cars

D Sadigh, SS Sastry, SA Seshia - IFAC-PapersOnLine, 2019 - Elsevier

As human-robot systems make their ways into our every day life, safety has become a core
concern of the learning algorithms used by such systems. Examples include semi …

被引用次数：30 相关文章所有 3 个版本

高级搜索

QQ 群