Today's world is highly network interconnected owing to the pervasiveness of small personal devices (eg, smartphones) as well as large computing devices or services (eg, cloud …
Despite efforts to align large language models (LLMs) with human values, widely-used LLMs such as GPT, Llama, Claude, and PaLM are susceptible to jailbreaking attacks …
Adversarial attacks are carried out to reveal the vulnerability of deep neural networks. Textual adversarial attacking is challenging because text is discrete and a small perturbation …
X Wang, H Wang, D Yang - arXiv preprint arXiv:2112.08313, 2021 - arxiv.org
As NLP models achieved state-of-the-art performances over benchmarks and gained wide applications, it has been increasingly important to ensure the safe deployment of these …
Adversarial examples expose the vulnerabilities of natural language processing (NLP) models, and can be used to evaluate and improve their robustness. Existing techniques of …
S Qiu, Q Liu, S Zhou, W Huang - Neurocomputing, 2022 - Elsevier
Recently, the adversarial attack and defense technology has made remarkable achievements and has been widely applied in the computer vision field, promoting its rapid …
Deep neural networks (DNNs) have achieved remarkable success in various tasks (eg, image classification, speech recognition, and natural language processing (NLP)). However …
X Wang, Y Yang, Y Deng, K He - … of the AAAI conference on artificial …, 2021 - ojs.aaai.org
Adversarial training is the most empirically successful approach in improving the robustness of deep neural networks for image classification. For text classification, however, existing …
Existing studies have demonstrated that adversarial examples can be directly attributed to the presence of non-robust features, which are highly predictive, but can be easily …