Trojaning language models for fun and profit

D Hendrycks, M Mazeika, T Woodside - arXiv preprint arXiv:2306.12001, 2023 - arxiv.org

Rapid advancements in artificial intelligence (AI) have sparked growing concerns among
experts, policymakers, and world leaders regarding the potential for increasingly advanced …

被引用次数：184 相关文章所有 4 个版本

[PDF] acm.org

Wild patterns reloaded: A survey of machine learning security against training data poisoning

AE Cinà, K Grosse, A Demontis, S Vascon… - ACM Computing …, 2023 - dl.acm.org

The success of machine learning is fueled by the increasing availability of computing power
and large training datasets. The training data is used to learn new models or update existing …

被引用次数：129 相关文章所有 11 个版本

[PDF] arxiv.org

Unsolved problems in ml safety

D Hendrycks, N Carlini, J Schulman… - arXiv preprint arXiv …, 2021 - arxiv.org

Machine learning (ML) systems are rapidly increasing in size, are acquiring new
capabilities, and are increasingly deployed in high-stakes settings. As with other powerful …

被引用次数：324 相关文章所有 6 个版本

[PDF] thecvf.com

Trojdiff: Trojan attacks on diffusion models with diverse targets

W Chen, D Song, B Li - … of the IEEE/CVF Conference on …, 2023 - openaccess.thecvf.com

Diffusion models have achieved great success in a range of tasks, such as image synthesis
and molecule design. As such successes hinge on large-scale training data collected from …

被引用次数：69 相关文章所有 8 个版本

[PDF] arxiv.org

Badencoder: Backdoor attacks to pre-trained encoders in self-supervised learning

J Jia, Y Liu, NZ Gong - 2022 IEEE Symposium on Security and …, 2022 - ieeexplore.ieee.org

Self-supervised learning in computer vision aims to pre-train an image encoder using a
large amount of unlabeled images or (image, text) pairs. The pre-trained image encoder can …

被引用次数：190 相关文章所有 7 个版本

[PDF] arxiv.org

Dataset security for machine learning: Data poisoning, backdoor attacks, and defenses

M Goldblum, D Tsipras, C Xie, X Chen… - … on Pattern Analysis …, 2022 - ieeexplore.ieee.org

As machine learning systems grow in scale, so do their training data requirements, forcing
practitioners to automate and outsource the curation of training data in order to achieve state …

被引用次数：245 相关文章所有 7 个版本

[PDF] arxiv.org

Hidden killer: Invisible textual backdoor attacks with syntactic trigger

F Qi, M Li, Y Chen, Z Zhang, Z Liu, Y Wang… - arXiv preprint arXiv …, 2021 - arxiv.org

Backdoor attacks are a kind of insidious security threat against machine learning models.
After being injected with a backdoor in training, the victim model will produce adversary …

被引用次数：207 相关文章所有 5 个版本

[PDF] arxiv.org

Badnl: Backdoor attacks against nlp models with semantic-preserving improvements

X Chen, A Salem, D Chen, M Backes, S Ma… - Proceedings of the 37th …, 2021 - dl.acm.org

Deep neural networks (DNNs) have progressed rapidly during the past decade and have
been deployed in various real-world applications. Meanwhile, DNN models have been …

被引用次数：389 相关文章所有 12 个版本

[PDF] arxiv.org

Onion: A simple and effective defense against textual backdoor attacks

F Qi, Y Chen, M Li, Y Yao, Z Liu, M Sun - arXiv preprint arXiv:2011.10369, 2020 - arxiv.org

Backdoor attacks are a kind of emergent training-time threat to deep neural networks
(DNNs). They can manipulate the output of DNNs and possess high insidiousness. In the …

被引用次数：243 相关文章所有 5 个版本

[PDF] arxiv.org

Privacy in large language models: Attacks, defenses and future directions

H Li, Y Chen, J Luo, J Wang, H Peng, Y Kang… - arXiv preprint arXiv …, 2023 - arxiv.org

The advancement of large language models (LLMs) has significantly enhanced the ability to
effectively tackle various downstream NLP tasks and unify these tasks into generative …

被引用次数：49 相关文章所有 2 个版本

高级搜索

QQ 群