关注
Aviral Kumar
Aviral Kumar
Google DeepMind
在 berkeley.edu 的电子邮件经过验证 - 首页
标题
引用次数
年份
RL on Incorrect Synthetic Data Scales the Efficiency of LLM Math Reasoning by Eight-Fold
A Setlur, S Garg, X Geng, N Garg, V Smith, A Kumar
arXiv preprint arXiv:2406.14532, 2024
2024
DigiRL: Training In-The-Wild Device-Control Agents with Autonomous Reinforcement Learning
H Bai, Y Zhou, M Cemri, J Pan, A Suhr, S Levine, A Kumar
arXiv preprint arXiv:2406.11896, 2024
2024
Is Value Learning Really the Main Bottleneck in Offline RL?
S Park, K Frans, S Levine, A Kumar
arXiv preprint arXiv:2406.09329, 2024
2024
Preference Fine-Tuning of LLMs Should Leverage Suboptimal, On-Policy Data
F Tajwar, A Singh, A Sharma, R Rafailov, J Schneider, T Xie, S Ermon, ...
arXiv preprint arXiv:2404.14367, 2024
112024
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context
M Reid, N Savinov, D Teplyashin, D Lepikhin, T Lillicrap, J Alayrac, ...
arXiv preprint arXiv:2403.05530, 2024
1082024
Unfamiliar Finetuning Examples Control How Language Models Hallucinate
K Kang, E Wallace, C Tomlin, A Kumar, S Levine
arXiv preprint arXiv:2403.05612, 2024
72024
Stop Regressing: Training Value Functions via Classification for Scalable Deep RL
J Farebrother, J Orbay, Q Vuong, AA Taïga, Y Chebotar, T Xiao, A Irpan, ...
arXiv preprint arXiv:2403.03950, 2024
72024
ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL
Y Zhou, A Zanette, J Pan, S Levine, A Kumar
arXiv preprint arXiv:2402.19446, 2024
62024
Cal-ql: Calibrated offline rl pre-training for efficient online fine-tuning
M Nakamoto, S Zhai, A Singh, M Sobol Mark, Y Ma, C Finn, A Kumar, ...
Advances in Neural Information Processing Systems 36, 2024
572024
Vision-Language Models Provide Promptable Representations for Reinforcement Learning
W Chen, O Mees, A Kumar, S Levine
arXiv preprint arXiv:2402.02651, 2024
42024
Gemini: a family of highly capable multimodal models
G Team, R Anil, S Borgeaud, Y Wu, JB Alayrac, J Yu, R Soricut, ...
arXiv preprint arXiv:2312.11805, 2023
8162023
Beyond uniform sampling: Offline reinforcement learning with imbalanced datasets
ZW Hong, A Kumar, S Karnik, A Bhandwaldar, A Srivastava, J Pajarinen, ...
Advances in Neural Information Processing Systems 36, 4985-5009, 2023
62023
Q-transformer: Scalable offline reinforcement learning via autoregressive q-functions
Y Chebotar, Q Vuong, K Hausman, F Xia, Y Lu, A Irpan, A Kumar, T Yu, ...
Conference on Robot Learning, 3909-3928, 2023
382023
Action-quantized offline reinforcement learning for robotic skill learning
J Luo, P Dong, J Wu, A Kumar, X Geng, S Levine
Conference on Robot Learning, 1348-1361, 2023
102023
Scaling Offline Q-Learning with Vision Transformers
Y Miao, J Orbay, R Agarwal, A Kumar, G Tucker, A Faust
NeurIPS 2023 Foundation Models for Decision Making Workshop, 2023
2023
Zero-shot robotic manipulation with pretrained image-editing diffusion models
K Black, M Nakamoto, P Atreya, H Walke, C Finn, A Kumar, S Levine
arXiv preprint arXiv:2310.10639, 2023
252023
Latent Conservative Objective Models for Data-Driven Crystal Structure Prediction
H Qi, X Geng, S Rando, I Ohama, A Kumar, S Levine
arXiv preprint arXiv:2310.10056, 2023
2*2023
Robotic offline rl from internet videos via value-function pre-training
C Bhateja, D Guo, D Ghosh, A Singh, M Tomar, Q Vuong, Y Chebotar, ...
arXiv preprint arXiv:2309.13041, 2023
102023
Efficient deep reinforcement learning requires regulating overfitting
Q Li, A Kumar, I Kostrikov, S Levine
arXiv preprint arXiv:2304.10466, 2023
232023
Don’t start from scratch: Leveraging prior data to automate robotic reinforcement learning
HR Walke, JH Yang, A Yu, A Kumar, J Orbik, A Singh, S Levine
Conference on Robot Learning, 1652-1662, 2023
292023
系统目前无法执行此操作,请稍后再试。
文章 1–20