Training verified learners with learned verifiers

J Leike, D Krueger, T Everitt, M Martic, V Maini… - arXiv preprint arXiv …, 2018 - arxiv.org

One obstacle to applying reinforcement learning algorithms to real-world problems is the
lack of suitable reward functions. Designing such reward functions is difficult in part because …

被引用次数：308 相关文章所有 6 个版本

[PDF] acm.org

Taxonomy of machine learning safety: A survey and primer

S Mohseni, H Wang, C Xiao, Z Yu, Z Wang… - ACM Computing …, 2022 - dl.acm.org

The open-world deployment of Machine Learning (ML) algorithms in safety-critical
applications such as autonomous vehicles needs to address a variety of ML vulnerabilities …

被引用次数：38 相关文章所有 4 个版本

[PDF] mlr.press

Certified adversarial robustness via randomized smoothing

J Cohen, E Rosenfeld, Z Kolter - international conference on …, 2019 - proceedings.mlr.press

We show how to turn any classifier that classifies well under Gaussian noise into a new
classifier that is certifiably robust to adversarial perturbations under the L2 norm. While this" …

被引用次数：2138 相关文章所有 6 个版本

[PDF] arxiv.org

Adversarial glue: A multi-task benchmark for robustness evaluation of language models

B Wang, C Xu, S Wang, Z Gan, Y Cheng, J Gao… - arXiv preprint arXiv …, 2021 - arxiv.org

Large-scale pre-trained language models have achieved tremendous success across a
wide range of natural language understanding (NLU) tasks, even surpassing human …

被引用次数：175 相关文章所有 6 个版本

[PDF] neurips.cc

Provably robust deep learning via adversarially trained smoothed classifiers

H Salman, J Li, I Razenshteyn… - Advances in neural …, 2019 - proceedings.neurips.cc

Recent works have shown the effectiveness of randomized smoothing as a scalable
technique for building neural network-based classifiers that are provably robust to $\ell_2 …

被引用次数：582 相关文章所有 9 个版本

[PDF] mit.edu

Robustness may be at odds with accuracy

D Tsipras, S Santurkar, L Engstrom, A Turner… - arXiv preprint arXiv …, 2018 - arxiv.org

We show that there may exist an inherent tension between the goal of adversarial
robustness and that of standard generalization. Specifically, training robust models may not …

被引用次数：1870 相关文章所有 7 个版本

[PDF] arxiv.org

Certified robustness to adversarial examples with differential privacy

M Lecuyer, V Atlidakis, R Geambasu… - … IEEE symposium on …, 2019 - ieeexplore.ieee.org

Adversarial examples that fool machine learning models, particularly deep neural networks,
have been a topic of intense research interest, with attacks and defenses being developed …

被引用次数：1077 相关文章所有 19 个版本

[PDF] neurips.cc

When does contrastive learning preserve adversarial robustness from pretraining to finetuning?

L Fan, S Liu, PY Chen, G Zhang… - Advances in neural …, 2021 - proceedings.neurips.cc

Contrastive learning (CL) can learn generalizable feature representations and achieve state-
of-the-art performance of downstream tasks by finetuning a linear classifier on top of it …

被引用次数：113 相关文章所有 8 个版本

On the effectiveness of interval bound propagation for training verifiably robust models

S Gowal, K Dvijotham, R Stanforth, R Bunel… - arXiv preprint arXiv …, 2018 - arxiv.org

Recent work has shown that it is possible to train deep neural networks that are provably
robust to norm-bounded adversarial perturbations. Most of these methods are based on …

被引用次数：521 相关文章所有 2 个版本

[PDF] neurips.cc

Semidefinite relaxations for certifying robustness to adversarial examples

A Raghunathan, J Steinhardt… - Advances in neural …, 2018 - proceedings.neurips.cc

Despite their impressive performance on diverse tasks, neural networks fail catastrophically
in the presence of adversarial inputs—imperceptibly but adversarially perturbed versions of …

被引用次数：489 相关文章所有 6 个版本

高级搜索

QQ 群