Generative reward models D Mahan, D Van Phung, R Rafailov, C Blagden, N Lile, L Castricato, ... arXiv preprint arXiv:2410.12832, 2024 | 5 | 2024 |
Towards System 2 Reasoning in LLMs: Learning How to Think With Meta Chain-of-Though V Xiang, C Snell, K Gandhi, A Albalak, A Singh, C Blagden, D Phung, ... arXiv preprint arXiv:2501.04682, 2025 | | 2025 |