RL on Incorrect Synthetic Data Scales the Efficiency of LLM Math Reasoning by Eight-Fold A Setlur, S Garg, X Geng, N Garg, V Smith, A Kumar arXiv preprint arXiv:2406.14532, 2024 | | 2024 |
DigiRL: Training In-The-Wild Device-Control Agents with Autonomous Reinforcement Learning H Bai, Y Zhou, M Cemri, J Pan, A Suhr, S Levine, A Kumar arXiv preprint arXiv:2406.11896, 2024 | | 2024 |
Is Value Learning Really the Main Bottleneck in Offline RL? S Park, K Frans, S Levine, A Kumar arXiv preprint arXiv:2406.09329, 2024 | | 2024 |
Preference Fine-Tuning of LLMs Should Leverage Suboptimal, On-Policy Data F Tajwar, A Singh, A Sharma, R Rafailov, J Schneider, T Xie, S Ermon, ... arXiv preprint arXiv:2404.14367, 2024 | 11 | 2024 |
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context M Reid, N Savinov, D Teplyashin, D Lepikhin, T Lillicrap, J Alayrac, ... arXiv preprint arXiv:2403.05530, 2024 | 108 | 2024 |
Unfamiliar Finetuning Examples Control How Language Models Hallucinate K Kang, E Wallace, C Tomlin, A Kumar, S Levine arXiv preprint arXiv:2403.05612, 2024 | 7 | 2024 |
Stop Regressing: Training Value Functions via Classification for Scalable Deep RL J Farebrother, J Orbay, Q Vuong, AA Taïga, Y Chebotar, T Xiao, A Irpan, ... arXiv preprint arXiv:2403.03950, 2024 | 7 | 2024 |
ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL Y Zhou, A Zanette, J Pan, S Levine, A Kumar arXiv preprint arXiv:2402.19446, 2024 | 6 | 2024 |
Cal-ql: Calibrated offline rl pre-training for efficient online fine-tuning M Nakamoto, S Zhai, A Singh, M Sobol Mark, Y Ma, C Finn, A Kumar, ... Advances in Neural Information Processing Systems 36, 2024 | 57 | 2024 |
Vision-Language Models Provide Promptable Representations for Reinforcement Learning W Chen, O Mees, A Kumar, S Levine arXiv preprint arXiv:2402.02651, 2024 | 4 | 2024 |
Gemini: a family of highly capable multimodal models G Team, R Anil, S Borgeaud, Y Wu, JB Alayrac, J Yu, R Soricut, ... arXiv preprint arXiv:2312.11805, 2023 | 816 | 2023 |
Beyond uniform sampling: Offline reinforcement learning with imbalanced datasets ZW Hong, A Kumar, S Karnik, A Bhandwaldar, A Srivastava, J Pajarinen, ... Advances in Neural Information Processing Systems 36, 4985-5009, 2023 | 6 | 2023 |
Q-transformer: Scalable offline reinforcement learning via autoregressive q-functions Y Chebotar, Q Vuong, K Hausman, F Xia, Y Lu, A Irpan, A Kumar, T Yu, ... Conference on Robot Learning, 3909-3928, 2023 | 38 | 2023 |
Action-quantized offline reinforcement learning for robotic skill learning J Luo, P Dong, J Wu, A Kumar, X Geng, S Levine Conference on Robot Learning, 1348-1361, 2023 | 10 | 2023 |
Scaling Offline Q-Learning with Vision Transformers Y Miao, J Orbay, R Agarwal, A Kumar, G Tucker, A Faust NeurIPS 2023 Foundation Models for Decision Making Workshop, 2023 | | 2023 |
Zero-shot robotic manipulation with pretrained image-editing diffusion models K Black, M Nakamoto, P Atreya, H Walke, C Finn, A Kumar, S Levine arXiv preprint arXiv:2310.10639, 2023 | 25 | 2023 |
Latent Conservative Objective Models for Data-Driven Crystal Structure Prediction H Qi, X Geng, S Rando, I Ohama, A Kumar, S Levine arXiv preprint arXiv:2310.10056, 2023 | 2* | 2023 |
Robotic offline rl from internet videos via value-function pre-training C Bhateja, D Guo, D Ghosh, A Singh, M Tomar, Q Vuong, Y Chebotar, ... arXiv preprint arXiv:2309.13041, 2023 | 10 | 2023 |
Efficient deep reinforcement learning requires regulating overfitting Q Li, A Kumar, I Kostrikov, S Levine arXiv preprint arXiv:2304.10466, 2023 | 23 | 2023 |
Don’t start from scratch: Leveraging prior data to automate robotic reinforcement learning HR Walke, JH Yang, A Yu, A Kumar, J Orbik, A Singh, S Levine Conference on Robot Learning, 1652-1662, 2023 | 29 | 2023 |