[HTML][HTML] Explainable ai: A review of machine learning interpretability methods

P Linardatos, V Papastefanopoulos, S Kotsiantis - Entropy, 2020 - mdpi.com
Recent advances in artificial intelligence (AI) have led to its widespread industrial adoption,
with machine learning systems demonstrating superhuman performance in a significant …

Machine learning in cybersecurity: a comprehensive survey

D Dasgupta, Z Akhtar, S Sen - The Journal of Defense …, 2022 - journals.sagepub.com
Today's world is highly network interconnected owing to the pervasiveness of small personal
devices (eg, smartphones) as well as large computing devices or services (eg, cloud …

Smoothllm: Defending large language models against jailbreaking attacks

A Robey, E Wong, H Hassani, GJ Pappas - arXiv preprint arXiv …, 2023 - arxiv.org
Despite efforts to align large language models (LLMs) with human values, widely-used
LLMs such as GPT, Llama, Claude, and PaLM are susceptible to jailbreaking attacks …

Word-level textual adversarial attacking as combinatorial optimization

Y Zang, F Qi, C Yang, Z Liu, M Zhang, Q Liu… - arXiv preprint arXiv …, 2019 - arxiv.org
Adversarial attacks are carried out to reveal the vulnerability of deep neural networks.
Textual adversarial attacking is challenging because text is discrete and a small perturbation …

Measure and improve robustness in NLP models: A survey

X Wang, H Wang, D Yang - arXiv preprint arXiv:2112.08313, 2021 - arxiv.org
As NLP models achieved state-of-the-art performances over benchmarks and gained wide
applications, it has been increasingly important to ensure the safe deployment of these …

Contextualized perturbation for textual adversarial attack

D Li, Y Zhang, H Peng, L Chen, C Brockett… - arXiv preprint arXiv …, 2020 - arxiv.org
Adversarial examples expose the vulnerabilities of natural language processing (NLP)
models, and can be used to evaluate and improve their robustness. Existing techniques of …

Adversarial attack and defense technologies in natural language processing: A survey

S Qiu, Q Liu, S Zhou, W Huang - Neurocomputing, 2022 - Elsevier
Recently, the adversarial attack and defense technology has made remarkable
achievements and has been widely applied in the computer vision field, promoting its rapid …

Towards a robust deep neural network against adversarial texts: A survey

W Wang, R Wang, L Wang, Z Wang… - ieee transactions on …, 2021 - ieeexplore.ieee.org
Deep neural networks (DNNs) have achieved remarkable success in various tasks (eg,
image classification, speech recognition, and natural language processing (NLP)). However …

Adversarial training with fast gradient projection method against synonym substitution based text attacks

X Wang, Y Yang, Y Deng, K He - … of the AAAI conference on artificial …, 2021 - ojs.aaai.org
Adversarial training is the most empirically successful approach in improving the robustness
of deep neural networks for image classification. For text classification, however, existing …

Improving the adversarial robustness of NLP models by information bottleneck

C Zhang, X Zhou, Y Wan, X Zheng, KW Chang… - arXiv preprint arXiv …, 2022 - arxiv.org
Existing studies have demonstrated that adversarial examples can be directly attributed to
the presence of non-robust features, which are highly predictive, but can be easily …