Qampari: An open-domain question answering benchmark for questions with many answers from multiple paragraphs SJ Amouyal, T Wolfson, O Rubin, O Yoran, J Herzig, J Berant arXiv preprint arXiv:2205.12665, 2022 | 37* | 2022 |
Large Language Models for Psycholinguistic Plausibility Pretesting SJ Amouyal, A Meltzer-Asscher, J Berant arXiv preprint arXiv:2402.05455, 2024 | 4 | 2024 |
STEER: Assessing the Economic Rationality of Large Language Models NK Raman, T Lundy, SJ Amouyal, Y Levine, K Leyton-Brown, ... Forty-first International Conference on Machine Learning, 0 | 3 | |
Rationality Report Cards: Assessing the Economic Rationality of Large Language Models N Raman, T Lundy, S Amouyal, Y Levine, K Leyton-Brown, ... arXiv preprint arXiv:2402.09552, 2024 | 2 | 2024 |
GLEE: A Unified Framework and Benchmark for Language-based Economic Environments E Shapira, O Madmon, I Reinman, SJ Amouyal, R Reichart, ... arXiv preprint arXiv:2410.05254, 2024 | | 2024 |
AssistantBench: Can Web Agents Solve Realistic and Time-Consuming Tasks? O Yoran, SJ Amouyal, C Malaviya, B Bogin, O Press, J Berant arXiv preprint arXiv:2407.15711, 2024 | | 2024 |