An overview of catastrophic ai risks

D Hendrycks, M Mazeika, T Woodside - arXiv preprint arXiv:2306.12001, 2023 - arxiv.org
Rapid advancements in artificial intelligence (AI) have sparked growing concerns among
experts, policymakers, and world leaders regarding the potential for increasingly advanced …

Ai alignment: A comprehensive survey

J Ji, T Qiu, B Chen, B Zhang, H Lou, K Wang… - arXiv preprint arXiv …, 2023 - arxiv.org
AI alignment aims to make AI systems behave in line with human intentions and values. As
AI systems grow more capable, the potential large-scale risks associated with misaligned AI …

Regulating ChatGPT and other large generative AI models

P Hacker, A Engel, M Mauer - Proceedings of the 2023 ACM Conference …, 2023 - dl.acm.org
Large generative AI models (LGAIMs), such as ChatGPT, GPT-4 or Stable Diffusion, are
rapidly transforming the way we communicate, illustrate, and create. However, AI regulation …

On the exploitability of instruction tuning

M Shu, J Wang, C Zhu, J Geiping… - Advances in Neural …, 2023 - proceedings.neurips.cc
Instruction tuning is an effective technique to align large language models (LLMs) with
human intent. In this work, we investigate how an adversary can exploit instruction tuning by …

Llm self defense: By self examination, llms know they are being tricked

A Helbling, M Phute, M Hull, DH Chau - arXiv preprint arXiv:2308.07308, 2023 - arxiv.org
Large language models (LLMs) have skyrocketed in popularity in recent years due to their
ability to generate high-quality text in response to human prompting. However, these models …

Who wrote this code? watermarking for code generation

T Lee, S Hong, J Ahn, I Hong, H Lee, S Yun… - arXiv preprint arXiv …, 2023 - arxiv.org
Large language models for code have recently shown remarkable performance in
generating executable code. However, this rapid advancement has been accompanied by …

A survey on llm-gernerated text detection: Necessity, methods, and future directions

J Wu, S Yang, R Zhan, Y Yuan, DF Wong… - arXiv preprint arXiv …, 2023 - arxiv.org
The powerful ability to understand, follow, and generate complex language emerging from
large language models (LLMs) makes LLM-generated text flood many areas of our daily …

Industrial practitioners' mental models of adversarial machine learning

L Bieringer, K Grosse, M Backes, B Biggio… - … Symposium on Usable …, 2022 - usenix.org
Although machine learning is widely used in practice, little is known about practitioners'
understanding of potential security challenges. In this work, we close this substantial gap …

A survey on explainable AI for 6G O-RAN: Architecture, use cases, challenges and research directions

B Brik, H Chergui, L Zanzi, F Devoti, A Ksentini… - arXiv preprint arXiv …, 2023 - arxiv.org
The recent O-RAN specifications promote the evolution of RAN architecture by function
disaggregation, adoption of open interfaces, and instantiation of a hierarchical closed-loop …

[HTML][HTML] Warning: humans cannot reliably detect speech deepfakes

KT Mai, S Bray, T Davies, LD Griffin - Plos one, 2023 - journals.plos.org
Speech deepfakes are artificial voices generated by machine learning models. Previous
literature has highlighted deepfakes as one of the biggest security threats arising from …