The threat of offensive ai to organizations

D Hendrycks, M Mazeika, T Woodside - arXiv preprint arXiv:2306.12001, 2023 - arxiv.org

Rapid advancements in artificial intelligence (AI) have sparked growing concerns among
experts, policymakers, and world leaders regarding the potential for increasingly advanced …

被引用次数：81 相关文章所有 3 个版本

[PDF] arxiv.org

Ai alignment: A comprehensive survey

J Ji, T Qiu, B Chen, B Zhang, H Lou, K Wang… - arXiv preprint arXiv …, 2023 - arxiv.org

AI alignment aims to make AI systems behave in line with human intentions and values. As
AI systems grow more capable, the potential large-scale risks associated with misaligned AI …

被引用次数：88 相关文章所有 3 个版本

[PDF] acm.org

Regulating ChatGPT and other large generative AI models

P Hacker, A Engel, M Mauer - Proceedings of the 2023 ACM Conference …, 2023 - dl.acm.org

Large generative AI models (LGAIMs), such as ChatGPT, GPT-4 or Stable Diffusion, are
rapidly transforming the way we communicate, illustrate, and create. However, AI regulation …

被引用次数：234 相关文章所有 3 个版本

[PDF] neurips.cc

On the exploitability of instruction tuning

M Shu, J Wang, C Zhu, J Geiping… - Advances in Neural …, 2023 - proceedings.neurips.cc

Instruction tuning is an effective technique to align large language models (LLMs) with
human intent. In this work, we investigate how an adversary can exploit instruction tuning by …

被引用次数：35 相关文章所有 4 个版本

[PDF] arxiv.org

Llm self defense: By self examination, llms know they are being tricked

A Helbling, M Phute, M Hull, DH Chau - arXiv preprint arXiv:2308.07308, 2023 - arxiv.org

Large language models (LLMs) have skyrocketed in popularity in recent years due to their
ability to generate high-quality text in response to human prompting. However, these models …

被引用次数：56 相关文章所有 3 个版本

[PDF] arxiv.org

Who wrote this code? watermarking for code generation

T Lee, S Hong, J Ahn, I Hong, H Lee, S Yun… - arXiv preprint arXiv …, 2023 - arxiv.org

Large language models for code have recently shown remarkable performance in
generating executable code. However, this rapid advancement has been accompanied by …

被引用次数：27 相关文章所有 2 个版本

[PDF] arxiv.org

A survey on llm-gernerated text detection: Necessity, methods, and future directions

J Wu, S Yang, R Zhan, Y Yuan, DF Wong… - arXiv preprint arXiv …, 2023 - arxiv.org

The powerful ability to understand, follow, and generate complex language emerging from
large language models (LLMs) makes LLM-generated text flood many areas of our daily …

被引用次数：33 相关文章所有 2 个版本

[PDF] usenix.org

Industrial practitioners' mental models of adversarial machine learning

L Bieringer, K Grosse, M Backes, B Biggio… - … Symposium on Usable …, 2022 - usenix.org

Although machine learning is widely used in practice, little is known about practitioners'
understanding of potential security challenges. In this work, we close this substantial gap …

被引用次数：20 相关文章所有 9 个版本

[PDF] arxiv.org

A survey on explainable AI for 6G O-RAN: Architecture, use cases, challenges and research directions

B Brik, H Chergui, L Zanzi, F Devoti, A Ksentini… - arXiv preprint arXiv …, 2023 - arxiv.org

The recent O-RAN specifications promote the evolution of RAN architecture by function
disaggregation, adoption of open interfaces, and instantiation of a hierarchical closed-loop …

被引用次数：13 相关文章所有 2 个版本

[HTML] plos.org

[HTML][HTML] Warning: humans cannot reliably detect speech deepfakes

KT Mai, S Bray, T Davies, LD Griffin - Plos one, 2023 - journals.plos.org

Speech deepfakes are artificial voices generated by machine learning models. Previous
literature has highlighted deepfakes as one of the biggest security threats arising from …

被引用次数：21 相关文章所有 10 个版本

高级搜索

QQ 群