MathBench: Evaluating the Theory and Application Proficiency of LLMs with a Hierarchical Mathematics Benchmark H Liu, Z Zheng, Y Qiao, H Duan, Z Fei, F Zhou, W Zhang, S Zhang, D Lin, ... arXiv preprint arXiv:2405.12209, 2024 | 1 | 2024 |
Prism: A Framework for Decoupling and Assessing the Capabilities of VLMs Y Qiao, H Duan, X Fang, J Yang, L Chen, S Zhang, J Wang, D Lin, ... arXiv preprint arXiv:2406.14544, 2024 | | 2024 |