A survey of safety and trustworthiness of deep neural networks: Verification, testing, adversarial attack and defence, and interpretability

X Huang, D Kroening, W Ruan, J Sharp, Y Sun… - Computer Science …, 2020 - Elsevier
In the past few years, significant progress has been made on deep neural networks (DNNs)
in achieving human-level performance on several long-standing tasks. With the broader …

Deep reinforcement learning verification: a survey

M Landers, A Doryab - ACM Computing Surveys, 2023 - dl.acm.org
Deep reinforcement learning (DRL) has proven capable of superhuman performance on
many complex tasks. To achieve this success, DRL algorithms train a decision-making agent …

Robustbench: a standardized adversarial robustness benchmark

F Croce, M Andriushchenko, V Sehwag… - arXiv preprint arXiv …, 2020 - arxiv.org
As a research community, we are still lacking a systematic understanding of the progress on
adversarial robustness which often makes it hard to identify the most promising ideas in …

Trustllm: Trustworthiness in large language models

L Sun, Y Huang, H Wang, S Wu, Q Zhang… - arXiv preprint arXiv …, 2024 - arxiv.org
Large language models (LLMs), exemplified by ChatGPT, have gained considerable
attention for their excellent natural language processing capabilities. Nonetheless, these …

Uncovering the limits of adversarial training against norm-bounded adversarial examples

S Gowal, C Qin, J Uesato, T Mann, P Kohli - arXiv preprint arXiv …, 2020 - arxiv.org
Adversarial training and its variants have become de facto standards for learning robust
deep neural networks. In this paper, we explore the landscape around adversarial training in …

Structure invariant transformation for better adversarial transferability

X Wang, Z Zhang, J Zhang - Proceedings of the IEEE/CVF …, 2023 - openaccess.thecvf.com
Given the severe vulnerability of Deep Neural Networks (DNNs) against adversarial
examples, there is an urgent need for an effective adversarial attack to identify the …

Globally-robust neural networks

K Leino, Z Wang, M Fredrikson - … Conference on Machine …, 2021 - proceedings.mlr.press
The threat of adversarial examples has motivated work on training certifiably robust neural
networks to facilitate efficient verification of local robustness at inference time. We formalize …

Sok: Certified robustness for deep neural networks

L Li, T Xie, B Li - 2023 IEEE symposium on security and privacy …, 2023 - ieeexplore.ieee.org
Great advances in deep neural networks (DNNs) have led to state-of-the-art performance on
a wide range of tasks. However, recent studies have shown that DNNs are vulnerable to …

PRIMA: general and precise neural network certification via scalable convex hull approximations

MN Müller, G Makarchuk, G Singh, M Püschel… - Proceedings of the …, 2022 - dl.acm.org
Formal verification of neural networks is critical for their safe adoption in real-world
applications. However, designing a precise and scalable verifier which can handle different …

Efficient adversarial training without attacking: Worst-case-aware robust reinforcement learning

Y Liang, Y Sun, R Zheng… - Advances in Neural …, 2022 - proceedings.neurips.cc
Recent studies reveal that a well-trained deep reinforcement learning (RL) policy can be
particularly vulnerable to adversarial perturbations on input observations. Therefore, it is …