- 学术资源搜索

Mathvista: Evaluating mathematical reasoning of foundation models in visual contexts

P Lu, H Bansal, T Xia, J Liu, C Li, H Hajishirzi… - arXiv preprint arXiv …, 2023 - arxiv.org

Although Large Language Models (LLMs) and Large Multimodal Models (LMMs) exhibit
impressive skills in various domains, their ability for mathematical reasoning within visual …

被引用次数：363 相关文章所有 3 个版本

[PDF] arxiv.org

Lawbench: Benchmarking legal knowledge of large language models

Z Fei, X Shen, D Zhu, F Zhou, Z Han, S Zhang… - arXiv preprint arXiv …, 2023 - arxiv.org

Large language models (LLMs) have demonstrated strong capabilities in various aspects.
However, when applying them to the highly specialized, safe-critical legal domain, it is …

被引用次数：72 相关文章所有 3 个版本

CyberMetric: a benchmark dataset based on retrieval-augmented generation for evaluating LLMs in cybersecurity knowledge

N Tihanyi, MA Ferrag, R Jain, T Bisztray… - … Conference on Cyber …, 2024 - ieeexplore.ieee.org

Large Language Models (LLMs) are increasingly used across various domains, from
software development to cyber threat intelligence. Understanding all the different …

被引用次数：14 相关文章

[PDF] arxiv.org

Where Are Large Language Models for Code Generation on GitHub?

X Yu, L Liu, X Hu, JW Keung, J Liu, X Xia - arXiv preprint arXiv:2406.19544, 2024 - arxiv.org

The increasing use of Large Language Models (LLMs) in software development has
garnered significant attention from researchers assessing the quality of the code they …

被引用次数：5 相关文章所有 3 个版本

[PDF] mlsys.org

CloudEval-YAML: A Practical Benchmark for Cloud Configuration Generation

Y Xu, Y Chen, X Zhang, X Lin, P Hu… - Proceedings of …, 2024 - proceedings.mlsys.org

Among the thriving ecosystem of cloud computing and the proliferation of Large Language
Model (LLM)-based code generation tools, there is a lack of benchmarking for code …

被引用次数：11 相关文章所有 5 个版本

[PDF] arxiv.org

Polymath: A challenging multi-modal mathematical reasoning benchmark

H Gupta, S Verma, U Anantheswaran, K Scaria… - arXiv preprint arXiv …, 2024 - arxiv.org

Multi-modal Large Language Models (MLLMs) exhibit impressive problem-solving abilities
in various domains, but their visual comprehension and abstract reasoning skills remain …

被引用次数：3 相关文章所有 2 个版本

[PDF] arxiv.org

From Effectiveness to Efficiency: Comparative Evaluation of Code Generated by LCGMs for Bilingual Programming Questions

W Jiang, X Gao, J Zhai, S Ma, X Zhang… - arXiv preprint arXiv …, 2024 - arxiv.org

Large Code Generation Models (LCGMs) have garnered significant attention and achieved
promising results across various programming tasks. However, concerns arise regarding …

被引用次数：1 相关文章所有 2 个版本

[PDF] arxiv.org

VersiCode: Towards Version-controllable Code Generation

T Wu, W Wu, X Wang, K Xu, S Ma, B Jiang… - arXiv preprint arXiv …, 2024 - arxiv.org

Significant research has focused on improving the performance of large language model on
code-related tasks due to their practical importance. Although performance is typically …

被引用次数：4 相关文章所有 2 个版本

[HTML] researchprotocols.org

[HTML][HTML] Assessing and Optimizing Large Language Models on Spondyloarthritis Multi-Choice Question Answering: Protocol for Enhancement and Assessment

A Wang, Y Wu, X Ji, X Wang, J Hu… - JMIR Research …, 2024 - researchprotocols.org

Background Spondyloarthritis (SpA), a chronic inflammatory disorder, predominantly
impacts the sacroiliac joints and spine, significantly escalating the risk of disability. SpA's …

[PDF][PDF] Multi-Intent Inline Code Comment Generation via Large Language Model

X Zhang, Z Chen, Y Cao, L Chen… - International Journal of …, 2024 - researchgate.net

Comment generation (aka code summarization) refers to the process of generating concise
and fluent natural language descriptions for a piece of code [1–4]. It is considered a …

高级搜索

QQ 群

Mathvista: Evaluating mathematical reasoning of foundation models in visual contexts

Lawbench: Benchmarking legal knowledge of large language models

CyberMetric: a benchmark dataset based on retrieval-augmented generation for evaluating LLMs in cybersecurity knowledge

Where Are Large Language Models for Code Generation on GitHub?

CloudEval-YAML: A Practical Benchmark for Cloud Configuration Generation

Polymath: A challenging multi-modal mathematical reasoning benchmark

From Effectiveness to Efficiency: Comparative Evaluation of Code Generated by LCGMs for Bilingual Programming Questions

VersiCode: Towards Version-controllable Code Generation

[HTML][HTML] Assessing and Optimizing Large Language Models on Spondyloarthritis Multi-Choice Question Answering: Protocol for Enhancement and Assessment

[PDF][PDF] Multi-Intent Inline Code Comment Generation via Large Language Model

引用