Boosting the phishing detection performance by semantic analysis

X Zhang, Y Zeng, XB Jin, ZW Yan… - 2017 ieee international …, 2017 - ieeexplore.ieee.org
X Zhang, Y Zeng, XB Jin, ZW Yan, GG Geng
2017 ieee international conference on big data (big data), 2017ieeexplore.ieee.org
Phishing is increasingly severe in recent years, which seriously threatens the privacy and
property security of netizens. Phishing is essentially a counterfeiting of brands. In order to
effectively cheat the victim, phishing sites are visually and semantically highly similar to real
sites. In recent years, anti-phishing methods based on machine learning are mainstream
anti-phishing methods. The effectiveness of the machine learning models hinges on the
extracted statistical features. However, the extracted statistical features mainly focus on …
Phishing is increasingly severe in recent years, which seriously threatens the privacy and property security of netizens. Phishing is essentially a counterfeiting of brands. In order to effectively cheat the victim, phishing sites are visually and semantically highly similar to real sites. In recent years, anti-phishing methods based on machine learning are mainstream anti-phishing methods. The effectiveness of the machine learning models hinges on the extracted statistical features. However, the extracted statistical features mainly focus on visual similarity, stealing information and third-party services, which ignore the semantic information of web pages. Therefore, we extract a series of semantic features through word2vec to better describe the features of phishing sites, and further fuse them with other multi-scale statistical features to construct a more robust phishing detection model. The experimental results on the actual data sets show that the majority of phishing websites are effectively identified by only mining the semantic features of word embeddings. The phishing detection models based on fusion features obtained the best detection results, which shows that semantic features and other statistical features have good complementarity. The proposed method provides a promising way for phishing detection in actual Internet environment, which boosts the phishing detection performance effectively.
ieeexplore.ieee.org
以上显示的是最相近的搜索结果。 查看全部搜索结果