Generating Exceptional Behavior Tests with Reasoning Augmented Large Language Models

J Zhang, Y Liu, P Nie, JJ Li, M Gligoric - arXiv preprint arXiv:2405.14619, 2024 - arxiv.org
Many popular programming languages, including C#, Java, and Python, support exceptions.
Exceptions are thrown during program execution if an unwanted event happens, eg, a …

No offense taken: Eliciting offensiveness from language models

A Srivastava, R Ahuja, R Mukku - arXiv preprint arXiv:2310.00892, 2023 - arxiv.org
This work was completed in May 2022. For safe and reliable deployment of language
models in the real world, testing needs to be robust. This robustness can be characterized …

Inference of robust reachability constraints

Y Sellami, G Girol, F Recoules, D Couroussé… - Proceedings of the …, 2024 - dl.acm.org
Characterization of bugs and attack vectors is in many practical scenarios as important as
their finding. Recently, Girol et. al. have introduced the concept of robust reachability, which …

Measuring enforcement windows with symbolic trace interpretation: What well-behaved programs say

D Coughlin, BYE Chang, A Diwan, JG Siek - Proceedings of the 2012 …, 2012 - dl.acm.org
A static analysis design is sufficient if it can prove the property of interest with an acceptable
number of false alarms. Ultimately, the only way to confirm that an analysis design is …

Detecting Exception Handling Bugs in C++ Programs

H Zhang, J Luo, M Hu, J Yan… - 2023 IEEE/ACM 45th …, 2023 - ieeexplore.ieee.org
Exception handling is a mechanism in modern programming languages. Studies have
shown that the exception handling code is error-prone. However, there is still limited …

Towards automatic exception safety verification

X Li, HJ Hoover, P Rudnicki - FM 2006: Formal Methods: 14th International …, 2006 - Springer
Many programming languages provide exceptions as a structured way for detecting and
recovering from abnormal conditions. However, using exceptions properly is non-trivial …

LLM-Powered Test Case Generation for Detecting Tricky Bugs

K Liu, Y Liu, Z Chen, JM Zhang, Y Han, Y Ma… - arXiv preprint arXiv …, 2024 - arxiv.org
Conventional automated test generation tools struggle to generate test oracles and tricky
bug-revealing test inputs. Large Language Models (LLMs) can be prompted to produce test …

Avoiding data contamination in language model evaluation: Dynamic test construction with latest materials

Y Li, F Geurin, C Lin - arXiv preprint arXiv:2312.12343, 2023 - arxiv.org
Data contamination in evaluation is getting increasingly prevalent with the emerge of
language models pre-trained on super large, automatically-crawled corpora. This problem …

Large Language Models as Test Case Generators: Performance Evaluation and Enhancement

K Li, Y Yuan - arXiv preprint arXiv:2404.13340, 2024 - arxiv.org
Code generation with Large Language Models (LLMs) has been extensively studied and
achieved remarkable progress. As a complementary aspect to code generation, test case …

Out of sight, out of place: Detecting and assessing swapped arguments

R Scott, J Ranieri, L Kot… - 2020 IEEE 20th …, 2020 - ieeexplore.ieee.org
Programmers often add meaningful information about program semantics when naming
program entities such as variables, functions, and macros. However, static analysis tools …