- 学术资源搜索

Evaluating language models for efficient code generation

J Liu, S Xie, J Wang, Y Wei, Y Ding, L Zhang - arXiv preprint arXiv …, 2024 - arxiv.org

We introduce Differential Performance Evaluation (DPE), a framework designed to reliably
evaluate Large Language Models (LLMs) for efficient code generation. Traditional coding …

被引用次数：10 相关文章所有 3 个版本

[PDF] github.io

[PDF][PDF] Reasoning runtime behavior of a program with llm: How far are we?

J Chen, Z Pan, X Hu, Z Li, G Li, X Xia - arXiv preprint cs.SE …, 2024 - ginolzh.github.io

Large language models for code (ie, code LLMs) have shown strong code understanding
and generation capabilities. To evaluate the capabilities of code LLMs in various aspects …

被引用次数：4 相关文章所有 3 个版本

[PDF] arxiv.org

SpecEval: Evaluating Code Comprehension in Large Language Models via Program Specifications

L Ma, S Liu, L Bu, S Li, Y Wang, Y Liu - arXiv preprint arXiv:2409.12866, 2024 - arxiv.org

Large Language models have achieved impressive performance in automated software
engineering. Extensive efforts have been made to evaluate the abilities of code LLMs in …

被引用次数：2 相关文章所有 2 个版本

[PDF] arxiv.org

Best practices and lessons learned on synthetic data for language models

R Liu, J Wei, F Liu, C Si, Y Zhang, J Rao… - arXiv preprint arXiv …, 2024 - arxiv.org

The success of AI models relies on the availability of large, diverse, and high-quality
datasets, which can be challenging to obtain due to data scarcity, privacy concerns, and …

被引用次数：25 相关文章所有 2 个版本

[PDF] arxiv.org

Livecodebench: Holistic and contamination free evaluation of large language models for code

N Jain, K Han, A Gu, WD Li, F Yan, T Zhang… - arXiv preprint arXiv …, 2024 - arxiv.org

Large Language Models (LLMs) applied to code-related applications have emerged as a
prominent field, attracting significant interest from both academia and industry. However, as …

被引用次数：22 相关文章所有 3 个版本

[PDF] arxiv.org

Graph-enhanced Large Language Models in Asynchronous Plan Reasoning

F Lin, E La Malfa, V Hofmann, EM Yang, A Cohn… - arXiv preprint arXiv …, 2024 - arxiv.org

Reasoning about asynchronous plans is challenging since it requires sequential and
parallel planning to optimize time costs. Can large language models (LLMs) succeed at this …

被引用次数：17 相关文章所有 4 个版本

[PDF] acm.org

Neurosymbolic Repair of Test Flakiness

Y Chen, R Jabbarvand - Proceedings of the 33rd ACM SIGSOFT …, 2024 - dl.acm.org

Test flakiness, a non-deterministic behavior of builds irrelevant to code changes, is a major
and continuing impediment to deliver-ing reliable software. The very few techniques for the …

被引用次数：1 相关文章所有 3 个版本

[PDF] arxiv.org

Uncovering Weaknesses in Neural Code Generation

X Lian, S Wang, J Ma, F Liu, X Tan, L Zhang… - arXiv preprint arXiv …, 2024 - arxiv.org

Code generation, the task of producing source code from prompts, has seen significant
advancements with the advent of pre-trained large language models (PLMs). Despite these …

被引用次数：1 相关文章所有 2 个版本

[PDF] ceur-ws.org

[PDF][PDF] Design and Optimization of Heat Exchangers Using Large Language Models

S Mishra, VS Jadhav, S Karande… - Fourth Workshop on …, 2024 - ceur-ws.org

Heat exchangers (HEs) are essential in process industries for efficient thermal energy
transfer. Their design and optimization are crucial for improving energy efficiency, reducing …

被引用次数：2 相关文章所有 3 个版本

[PDF] arxiv.org

SecCodePLT: A Unified Platform for Evaluating the Security of Code GenAI

Y Yang, Y Nie, Z Wang, Y Tang, W Guo, B Li… - arXiv preprint arXiv …, 2024 - arxiv.org

Existing works have established multiple benchmarks to highlight the security risks
associated with Code GenAI. These risks are primarily reflected in two areas: a model …

高级搜索

QQ 群

Evaluating language models for efficient code generation

[PDF][PDF] Reasoning runtime behavior of a program with llm: How far are we?

SpecEval: Evaluating Code Comprehension in Large Language Models via Program Specifications

Best practices and lessons learned on synthetic data for language models

Livecodebench: Holistic and contamination free evaluation of large language models for code

Graph-enhanced Large Language Models in Asynchronous Plan Reasoning

Neurosymbolic Repair of Test Flakiness

Uncovering Weaknesses in Neural Code Generation

[PDF][PDF] Design and Optimization of Heat Exchangers Using Large Language Models

SecCodePLT: A Unified Platform for Evaluating the Security of Code GenAI

引用