An overview of catastrophic ai risks

D Hendrycks, M Mazeika, T Woodside - arXiv preprint arXiv:2306.12001, 2023 - arxiv.org
Rapid advancements in artificial intelligence (AI) have sparked growing concerns among
experts, policymakers, and world leaders regarding the potential for increasingly advanced …

Ai alignment: A comprehensive survey

J Ji, T Qiu, B Chen, B Zhang, H Lou, K Wang… - arXiv preprint arXiv …, 2023 - arxiv.org
AI alignment aims to make AI systems behave in line with human intentions and values. As
AI systems grow more capable, the potential large-scale risks associated with misaligned AI …

Can llm-generated misinformation be detected?

C Chen, K Shu - arXiv preprint arXiv:2309.13788, 2023 - arxiv.org
The advent of Large Language Models (LLMs) has made a transformative impact. However,
the potential that LLMs such as ChatGPT can be exploited to generate misinformation has …

From google gemini to openai q*(q-star): A survey of reshaping the generative artificial intelligence (ai) research landscape

TR McIntosh, T Susnjak, T Liu, P Watters… - arXiv preprint arXiv …, 2023 - arxiv.org
This comprehensive survey explored the evolving landscape of generative Artificial
Intelligence (AI), with a specific focus on the transformative impacts of Mixture of Experts …

Managing AI risks in an era of rapid progress

Y Bengio, G Hinton, A Yao, D Song, P Abbeel… - arXiv preprint arXiv …, 2023 - arxiv.org
In this short consensus paper, we outline risks from upcoming, advanced AI systems. We
examine large-scale social harms and malicious uses, as well as an irreversible loss of …

Managing extreme AI risks amid rapid progress

Y Bengio, G Hinton, A Yao, D Song, P Abbeel, T Darrell… - Science, 2024 - science.org
Artificial intelligence (AI) is progressing rapidly, and companies are shifting their focus to
developing generalist AI systems that can autonomously act and pursue goals. Increases in …

Look before you leap: An exploratory study of uncertainty measurement for large language models

Y Huang, J Song, Z Wang, H Chen, L Ma - arXiv preprint arXiv:2307.10236, 2023 - arxiv.org
The recent performance leap of Large Language Models (LLMs) opens up new
opportunities across numerous industrial applications and domains. However, erroneous …

Black-box access is insufficient for rigorous ai audits

S Casper, C Ezell, C Siegmann, N Kolt… - The 2024 ACM …, 2024 - dl.acm.org
External audits of AI systems are increasingly recognized as a key mechanism for AI
governance. The effectiveness of an audit, however, depends on the degree of access …

Foundational challenges in assuring alignment and safety of large language models

U Anwar, A Saparov, J Rando, D Paleka… - arXiv preprint arXiv …, 2024 - arxiv.org
This work identifies 18 foundational challenges in assuring the alignment and safety of large
language models (LLMs). These challenges are organized into three different categories …

Red-Teaming for Generative AI: Silver Bullet or Security Theater?

M Feffer, A Sinha, ZC Lipton, H Heidari - arXiv preprint arXiv:2401.15897, 2024 - arxiv.org
In response to rising concerns surrounding the safety, security, and trustworthiness of
Generative AI (GenAI) models, practitioners and regulators alike have pointed to AI red …