A survey of adversarial defenses and robustness in nlp

S Goyal, S Doddapaneni, MM Khapra… - ACM Computing …, 2023 - dl.acm.org
In the past few years, it has become increasingly evident that deep neural networks are not
resilient enough to withstand adversarial perturbations in input data, leaving them …

Llm-based edge intelligence: A comprehensive survey on architectures, applications, security and trustworthiness

O Friha, MA Ferrag, B Kantarci… - IEEE Open Journal …, 2024 - ieeexplore.ieee.org
The integration of Large Language Models (LLMs) and Edge Intelligence (EI) introduces a
groundbreaking paradigm for intelligent edge devices. With their capacity for human-like …

Trustllm: Trustworthiness in large language models

Y Huang, L Sun, H Wang, S Wu, Q Zhang, Y Li… - arXiv preprint arXiv …, 2024 - arxiv.org
Large language models (LLMs), exemplified by ChatGPT, have gained considerable
attention for their excellent natural language processing capabilities. Nonetheless, these …

[HTML][HTML] Position: TrustLLM: Trustworthiness in large language models

Y Huang, L Sun, H Wang, S Wu… - International …, 2024 - proceedings.mlr.press
Large language models (LLMs) have gained considerable attention for their excellent
natural language processing capabilities. Nonetheless, these LLMs present many …

A survey of safety and trustworthiness of large language models through the lens of verification and validation

X Huang, W Ruan, W Huang, G Jin, Y Dong… - Artificial Intelligence …, 2024 - Springer
Large language models (LLMs) have exploded a new heatwave of AI for their ability to
engage end-users in human-level conversations with detailed and articulate answers across …

Defending against alignment-breaking attacks via robustly aligned llm

B Cao, Y Cao, L Lin, J Chen - arXiv preprint arXiv:2309.14348, 2023 - arxiv.org
Recently, Large Language Models (LLMs) have made significant advancements and are
now widely used across various domains. Unfortunately, there has been a rising concern …

Robust natural language processing: Recent advances, challenges, and future directions

M Omar, S Choi, DH Nyang, D Mohaisen - IEEE Access, 2022 - ieeexplore.ieee.org
Recent natural language processing (NLP) techniques have accomplished high
performance on benchmark data sets, primarily due to the significant improvement in the …

Searching for an effective defender: Benchmarking defense against adversarial word substitution

Z Li, J Xu, J Zeng, L Li, X Zheng, Q Zhang… - arXiv preprint arXiv …, 2021 - arxiv.org
Recent studies have shown that deep neural networks are vulnerable to intentionally crafted
adversarial examples, and various methods have been proposed to defend against …

RS-Del: Edit distance robustness certificates for sequence classifiers via randomized deletion

Z Huang, NG Marchant, K Lucas… - Advances in …, 2023 - proceedings.neurips.cc
Randomized smoothing is a leading approach for constructing classifiers that are certifiably
robust against adversarial examples. Existing work on randomized smoothing has focused …

Text-crs: A generalized certified robustness framework against textual adversarial attacks

X Zhang, H Hong, Y Hong, P Huang… - … IEEE Symposium on …, 2024 - ieeexplore.ieee.org
The language models, especially the basic text classification models, have been shown to
be susceptible to textual adversarial attacks such as synonym substitution and word …