Ai alignment: A comprehensive survey

J Ji, T Qiu, B Chen, B Zhang, H Lou, K Wang… - arXiv preprint arXiv …, 2023 - arxiv.org
AI alignment aims to make AI systems behave in line with human intentions and values. As
AI systems grow more capable, the potential large-scale risks associated with misaligned AI …

Survey of vulnerabilities in large language models revealed by adversarial attacks

E Shayegani, MAA Mamun, Y Fu, P Zaree… - arXiv preprint arXiv …, 2023 - arxiv.org
Large Language Models (LLMs) are swiftly advancing in architecture and capability, and as
they integrate more deeply into complex systems, the urgency to scrutinize their security …

A survey of large language models

WX Zhao, K Zhou, J Li, T Tang, X Wang, Y Hou… - arXiv preprint arXiv …, 2023 - arxiv.org
Language is essentially a complex, intricate system of human expressions governed by
grammatical rules. It poses a significant challenge to develop capable AI algorithms for …

The rise and potential of large language model based agents: A survey

Z Xi, W Chen, X Guo, W He, Y Ding, B Hong… - arXiv preprint arXiv …, 2023 - arxiv.org
For a long time, humanity has pursued artificial intelligence (AI) equivalent to or surpassing
the human level, with AI agents considered a promising vehicle for this pursuit. AI agents are …

Autogen: Enabling next-gen llm applications via multi-agent conversation framework

Q Wu, G Bansal, J Zhang, Y Wu, S Zhang, E Zhu… - arXiv preprint arXiv …, 2023 - arxiv.org
This technical report presents AutoGen, a new framework that enables development of LLM
applications using multiple agents that can converse with each other to solve tasks. AutoGen …

Fine-tuning aligned language models compromises safety, even when users do not intend to!

X Qi, Y Zeng, T Xie, PY Chen, R Jia, P Mittal… - arXiv preprint arXiv …, 2023 - arxiv.org
Optimizing large language models (LLMs) for downstream use cases often involves the
customization of pre-trained LLMs through further fine-tuning. Meta's open release of Llama …

Large language models: A survey

S Minaee, T Mikolov, N Nikzad, M Chenaghlu… - arXiv preprint arXiv …, 2024 - arxiv.org
Large Language Models (LLMs) have drawn a lot of attention due to their strong
performance on a wide range of natural language tasks, since the release of ChatGPT in …

Trustllm: Trustworthiness in large language models

L Sun, Y Huang, H Wang, S Wu, Q Zhang… - arXiv preprint arXiv …, 2024 - arxiv.org
Large language models (LLMs), exemplified by ChatGPT, have gained considerable
attention for their excellent natural language processing capabilities. Nonetheless, these …

The alignment problem from a deep learning perspective

R Ngo, L Chan, S Mindermann - arXiv preprint arXiv:2209.00626, 2022 - arxiv.org
In coming decades, artificial general intelligence (AGI) may surpass human capabilities at
many critical tasks. We argue that, without substantial effort to prevent it, AGIs could learn to …

Can llm-generated misinformation be detected?

C Chen, K Shu - arXiv preprint arXiv:2309.13788, 2023 - arxiv.org
The advent of Large Language Models (LLMs) has made a transformative impact. However,
the potential that LLMs such as ChatGPT can be exploited to generate misinformation has …