The Operational Risks of AI in Large-Scale Biological Attacks: Results of a Red-Team Study

N Li, A Pan, A Gopal, S Yue, D Berrios, A Gatti… - arXiv preprint arXiv …, 2024 - arxiv.org

The White House Executive Order on Artificial Intelligence highlights the risks of large
language models (LLMs) empowering malicious actors in developing biological, cyber, and …

被引用次数：85 相关文章所有 3 个版本

[PDF] biocomm.ai

[PDF][PDF] Managing ai risks in an era of rapid progress

Y Bengio, G Hinton, A Yao, D Song… - arXiv preprint arXiv …, 2023 - blog.biocomm.ai

In this short consensus paper, we outline risks from upcoming, advanced AI systems. We
examine large-scale social harms and malicious uses, as well as an irreversible loss of …

被引用次数：85 相关文章所有 13 个版本

[PDF] arxiv.org

Refusal in language models is mediated by a single direction

A Arditi, O Obeso, A Syed, D Paleka… - arXiv preprint arXiv …, 2024 - arxiv.org

Conversational large language models are fine-tuned for both instruction-following and
safety, resulting in models that obey benign requests but refuse harmful ones. While this …

被引用次数：46 相关文章

[HTML] science.org

Managing extreme AI risks amid rapid progress

Y Bengio, G Hinton, A Yao, D Song, P Abbeel, T Darrell… - Science, 2024 - science.org

Artificial intelligence (AI) is progressing rapidly, and companies are shifting their focus to
developing generalist AI systems that can autonomously act and pursue goals. Increases in …

被引用次数：146 相关文章所有 5 个版本

[PDF] springer.com

Protecting society from AI misuse: when are restrictions on capabilities warranted?

M Anderljung, J Hazell, M von Knebel - AI & SOCIETY, 2024 - Springer

Artificial intelligence (AI) systems will increasingly be used to cause harm as they grow more
capable. In fact, AI systems are already starting to help automate fraudulent activities, violate …

被引用次数：39 相关文章所有 4 个版本

[PDF] arxiv.org

International Scientific Report on the Safety of Advanced AI (Interim Report)

Y Bengio, S Mindermann, D Privitera… - arXiv preprint arXiv …, 2024 - arxiv.org

This is the interim publication of the first International Scientific Report on the Safety of
Advanced AI. The report synthesises the scientific understanding of general-purpose AI--AI …

被引用次数：19 相关文章所有 4 个版本

[PDF] arxiv.org

Lora fine-tuning efficiently undoes safety training in llama 2-chat 70b

S Lermen, C Rogers-Smith, J Ladish - arXiv preprint arXiv:2310.20624, 2023 - arxiv.org

AI developers often apply safety alignment procedures to prevent the misuse of their AI
systems. For example, before Meta released Llama 2-Chat, a collection of instruction fine …

被引用次数：75 相关文章所有 4 个版本

[PDF] arxiv.org

Connecting the dots: Llms can infer and verbalize latent structure from disparate training data

J Treutlein, D Choi, J Betley, S Marks, C Anil… - arXiv preprint arXiv …, 2024 - arxiv.org

One way to address safety risks from large language models (LLMs) is to censor dangerous
knowledge from their training data. While this removes the explicit information, implicit …

被引用次数：8 相关文章

[PDF] aaai.org

Acceptable Use Policies for Foundation Models

K Klyman - Proceedings of the AAAI/ACM Conference on AI, Ethics …, 2024 - ojs.aaai.org

As foundation models have accumulated hundreds of millions of users, developers have
begun to take steps to prevent harmful types of uses. One salient intervention that foundation …

被引用次数：2 相关文章所有 5 个版本

[PDF] arxiv.org

On the limitations of compute thresholds as a governance strategy

S Hooker - arXiv preprint arXiv:2407.05694, 2024 - arxiv.org

At face value, this essay is about understanding a fairly esoteric governance tool called
compute thresholds. However, in order to grapple with whether these thresholds will achieve …

被引用次数：6 相关文章所有 2 个版本

高级搜索

QQ 群