[HTML][HTML] AI deception: A survey of examples, risks, and potential solutions

PS Park, S Goldstein, A O'Gara, M Chen, D Hendrycks - Patterns, 2024 - cell.com
This paper argues that a range of current AI systems have learned how to deceive humans.
We define deception as the systematic inducement of false beliefs in the pursuit of some …

A review of the explainability and safety of conversational agents for mental health to identify avenues for improvement

S Sarkar, M Gaur, LK Chen, M Garg… - Frontiers in Artificial …, 2023 - frontiersin.org
Virtual Mental Health Assistants (VMHAs) continuously evolve to support the overloaded
global healthcare system, which receives approximately 60 million primary care visits and 6 …

Foundational challenges in assuring alignment and safety of large language models

U Anwar, A Saparov, J Rando, D Paleka… - arXiv preprint arXiv …, 2024 - arxiv.org
This work identifies 18 foundational challenges in assuring the alignment and safety of large
language models (LLMs). These challenges are organized into three different categories …

Deception abilities emerged in large language models

T Hagendorff - Proceedings of the National Academy of …, 2024 - National Acad Sciences
Large language models (LLMs) are currently at the forefront of intertwining AI systems with
human communication and everyday life. Thus, aligning them with human values is of great …

Large language model alignment: A survey

T Shen, R Jin, Y Huang, C Liu, W Dong, Z Guo… - arXiv preprint arXiv …, 2023 - arxiv.org
Recent years have witnessed remarkable progress made in large language models (LLMs).
Such advancements, while garnering significant attention, have concurrently elicited various …

How to catch an ai liar: Lie detection in black-box llms by asking unrelated questions

L Pacchiardi, AJ Chan, S Mindermann… - arXiv preprint arXiv …, 2023 - arxiv.org
Large language models (LLMs) can" lie", which we define as outputting false statements
despite" knowing" the truth in a demonstrable sense. LLMs might" lie", for example, when …

A comprehensive survey on evaluating large language model applications in the medical industry

Y Huang, K Tang, M Chen, B Wang - arXiv preprint arXiv:2404.15777, 2024 - arxiv.org
Since the inception of the Transformer architecture in 2017, Large Language Models (LLMs)
such as GPT and BERT have evolved significantly, impacting various industries with their …

Jade: A linguistics-based safety evaluation platform for llm

M Zhang, X Pan, M Yang - arXiv preprint arXiv:2311.00286, 2023 - arxiv.org
In this paper, we present JADE, a targeted linguistic fuzzing platform which strengthens the
linguistic complexity of seed questions to simultaneously and consistently break a wide …

Forecastbench: A dynamic benchmark of ai forecasting capabilities

E Karger, H Bastani, C Yueh-Han, Z Jacobs… - arXiv preprint arXiv …, 2024 - arxiv.org
Forecasts of future events are essential inputs into informed decision-making. Machine
learning (ML) systems have the potential to deliver forecasts at scale, but there is no …

Dual Process Theory for Large Language Models: An overview of using Psychology to address hallucination and reliability issues

SC Bellini-Leite - Adaptive Behavior, 2024 - journals.sagepub.com
State-of-the-art Large Language Models have recently exhibited extraordinary linguistic
abilities which have surprisingly extended to reasoning. However, responses that are …