关注
Boyi Liu
Boyi Liu
在 u.northwestern.edu 的电子邮件经过验证
标题
引用次数
引用次数
年份
Neural trust region/proximal policy optimization attains globally optimal policy
B Liu, Q Cai, Z Yang, Z Wang
Advances in neural information processing systems 32, 2019
2072019
Reason for future, act for now: A principled framework for autonomous llm agents with provable sample efficiency
Z Liu, H Hu, S Zhang, H Guo, S Ke, B Liu, Z Wang
arXiv preprint arXiv:2309.17382, 2023
21*2023
Off-policy evaluation and learning from logged bandit feedback: Error reduction via surrogate policy
Y Xie, B Liu, Q Liu, Z Wang, Y Zhou, J Peng
International Conference on Learning Representations, 2018
202018
Inducing Equilibria via Incentives: Simultaneous Design-and-Play Ensures Global Convergence
B Liu, J Li, Z Yang, HT Wai, M Hong, Y Nie, Z Wang
Advances in Neural Information Processing Systems, 2022
15*2022
Differentiable bilevel programming for stackelberg congestion games
J Li, J Yu, Q Wang, B Liu, Z Wang, YM Nie
arXiv preprint arXiv:2209.07618, 2022
132022
Relational Reasoning via Set Transformers: Provable Efficiency and Applications to MARL
F Zhang, B Liu, K Wang, VYF Tan, Z Yang, Z Wang
Advances in Neural Information Processing Systems, 2022
82022
An analysis of attention via the lens of exchangeability and latent variable models
Y Zhang, B Liu, Q Cai, L Wang, Z Wang
arXiv preprint arXiv:2212.14852, 2022
72022
Let models speak ciphers: Multiagent debate through embeddings
C Pham, B Liu, Y Yang, Z Chen, T Liu, J Yuan, BA Plummer, Z Wang, ...
arXiv preprint arXiv:2310.06272, 2023
52023
Policy Optimization in Zero-Sum Markov Games: Fictitious Self-Play Provably Attains Nash Equilibria
B Liu, Z Yang, Z Wang
32020
Provably mitigating overoptimization in rlhf: Your sft loss is implicitly an adversarial regularizer
Z Liu, M Lu, S Zhang, B Liu, H Guo, Y Yang, J Blanchet, Z Wang
arXiv preprint arXiv:2405.16436, 2024
22024
Model-based reparameterization policy gradient methods: Theory and practical algorithms
S Zhang, B Liu, Z Wang, T Zhao
Advances in Neural Information Processing Systems 36, 2024
22024
Achieving hierarchy-free approximation for bilevel programs with equilibrium constraints
J Li, J Yu, B Liu, Y Nie, Z Wang
International Conference on Machine Learning, 20312-20335, 2023
22023
Differentiable Arbitrating in Zero-sum Markov Games
J Wang, M Song, F Gao, B Liu, Z Wang, Y Wu
International Conference on Autonomous Agents and Multiagent Systems (AAMAS), 2023
12023
-Puzzle: A Cost-Efficient Testbed for Benchmarking Reinforcement Learning Algorithms in Generative Language Model
Y Zhang, L Chen, B Liu, Y Yang, Q Cui, Y Tao, H Yang
arXiv preprint arXiv:2403.07191, 2024
2024
Double duality: variational primal-dual policy optimization for constrained reinforcement learning
Z Li, B Liu, Z Yang, Z Wang, M Wang
Journal of Machine Learning Research 24 (385), 1-43, 2023
2023
BooVI: provably efficient bootstrapped value iteration
B Liu, Q Cai, Z Yang, Z Wang
Advances in Neural Information Processing Systems 34, 7041-7053, 2021
2021
系统目前无法执行此操作,请稍后再试。
文章 1–16