Deep neural-based vulnerability discovery demystified: data, model and performance

G Lin, W Xiao, LY Zhang, S Gao, Y Tai… - Neural Computing and …, 2021 - Springer
Neural Computing and Applications, 2021Springer
Detecting source-code level vulnerabilities at the development phase is a cost-effective
solution to prevent potential attacks from happening at the software deployment stage. Many
machine learning, including deep learning-based solutions, have been proposed to aid the
process of vulnerability discovery. However, these approaches were mainly evaluated on
self-constructed/-collected datasets. It is difficult to evaluate the effectiveness of proposed
approaches due to lacking a unified baseline dataset. To bridge this gap, we construct a …
Abstract
Detecting source-code level vulnerabilities at the development phase is a cost-effective solution to prevent potential attacks from happening at the software deployment stage. Many machine learning, including deep learning-based solutions, have been proposed to aid the process of vulnerability discovery. However, these approaches were mainly evaluated on self-constructed/-collected datasets. It is difficult to evaluate the effectiveness of proposed approaches due to lacking a unified baseline dataset. To bridge this gap, we construct a function-level vulnerability dataset from scratch, providing in source-code-label pairs. To evaluate the constructed dataset, a function-level vulnerability detection framework is built to incorporate six mainstream neural network models as vulnerability detectors. We perform experiments to investigate the performance behaviors of the neural model-based detectors using source code as raw input with continuous Bag-of-Words neural embeddings. Empirical results reveal that the variants of recurrent neural networks and convolutional neural network perform well on our dataset, as the former is capable of handling contextual information and the latter learns features from small context windows. In terms of generalization ability, the fully connected network outperforms the other network architectures. The performance evaluation can serve as a reference benchmark for neural model-based vulnerability detection at function-level granularity. Our dataset can serve as ground truth for ML-based function-level vulnerability detection and a baseline for evaluating relevant approaches.
Springer
以上显示的是最相近的搜索结果。 查看全部搜索结果