作者
Bo Cui, Jinling Li, Wenhan Hou
发表日期
2023/10/21
期刊
Web Information Systems Engineering–WISE 2023: 24th International Conference, Melbourne, VIC, Australia, October 25–27, 2023, Proceedings
卷号
14306
页码范围
189
出版商
Springer Nature
简介
With the situation of cyber security becoming more and more complex, the mining and analysis of Cyber Threat Intelligence (CTI) have become a prominent focus in the field of cyber security. Social media platforms like Twitter, due to their powerful timeliness and extensive coverage, have become valuable data sources for cyber security. However, these data often comprise a substantial amount of invalid and interfering data, posing challenges for existing deep learning models in identifying critical CTI. To address this issue, we propose a novel CTI automatic extraction model, called ATDG, designed for detecting cyber security text and extracting cyber threat entities. Specifically, our model utilizes a Deep Pyramid Convolutional Neural Network (DPCNN) and BIGRU to extract character-level and word-level features from the text, to better extract of semantic information at different levels, which effectively improved out of vocabulary (OOV) problem in threat intelligence. Additionally, we introduce a self-attention mechanism at the encoding layer to enable the model to focus on key features and enhance its performance, which dynamically adjusts the attention given to different features. Furthermore, to address the issue of imbalanced sample distribution, we have incorporated Focal Loss into ATDG, enhancing our model capability to effectively handle data imbalances. Experimental results demonstrate that ATDG (92.49% F1-score and 93.07% F1-score) outperforms the state-of-the-art methods in both tasks, and effectiveness of introducing self-attention mechanism and Focal Loss is also demonstrated.