- 学术资源搜索

Combating misinformation in the age of llms: Opportunities and challenges

C Chen, K Shu - AI Magazine, 2023 - Wiley Online Library

Misinformation such as fake news and rumors is a serious threat for information ecosystems
and public trust. The emergence of large language models (LLMs) has great potential to …

被引用次数：81 相关文章所有 4 个版本

[PDF] arxiv.org

Ai alignment: A comprehensive survey

J Ji, T Qiu, B Chen, B Zhang, H Lou, K Wang… - arXiv preprint arXiv …, 2023 - arxiv.org

AI alignment aims to make AI systems behave in line with human intentions and values. As
AI systems grow more capable, the potential large-scale risks associated with misaligned AI …

被引用次数：144 相关文章所有 3 个版本

[PDF] arxiv.org

Gpt-4 technical report

J Achiam, S Adler, S Agarwal, L Ahmad… - arXiv preprint arXiv …, 2023 - arxiv.org

We report the development of GPT-4, a large-scale, multimodal model which can accept
image and text inputs and produce text outputs. While less capable than humans in many …

被引用次数：4266 相关文章所有 3 个版本

[PDF] arxiv.org

Open problems and fundamental limitations of reinforcement learning from human feedback

S Casper, X Davies, C Shi, TK Gilbert… - arXiv preprint arXiv …, 2023 - arxiv.org

Reinforcement learning from human feedback (RLHF) is a technique for training AI systems
to align with human goals. RLHF has emerged as the central method used to finetune state …

被引用次数：317 相关文章所有 6 个版本

[PDF] arxiv.org

Managing AI risks in an era of rapid progress

Y Bengio, G Hinton, A Yao, D Song, P Abbeel… - arXiv preprint arXiv …, 2023 - arxiv.org

In this short consensus paper, we outline risks from upcoming, advanced AI systems. We
examine large-scale social harms and malicious uses, as well as an irreversible loss of …

被引用次数：72 相关文章所有 14 个版本

[HTML] science.org

Managing extreme AI risks amid rapid progress

Y Bengio, G Hinton, A Yao, D Song, P Abbeel, T Darrell… - Science, 2024 - science.org

Artificial intelligence (AI) is progressing rapidly, and companies are shifting their focus to
developing generalist AI systems that can autonomously act and pursue goals. Increases in …

被引用次数：68 相关文章所有 5 个版本

[PDF] acm.org

Black-box access is insufficient for rigorous ai audits

S Casper, C Ezell, C Siegmann, N Kolt… - The 2024 ACM …, 2024 - dl.acm.org

External audits of AI systems are increasingly recognized as a key mechanism for AI
governance. The effectiveness of an audit, however, depends on the degree of access …

被引用次数：33 相关文章所有 4 个版本

[PDF] arxiv.org

Foundational challenges in assuring alignment and safety of large language models

U Anwar, A Saparov, J Rando, D Paleka… - arXiv preprint arXiv …, 2024 - arxiv.org

This work identifies 18 foundational challenges in assuring the alignment and safety of large
language models (LLMs). These challenges are organized into three different categories …

被引用次数：53 相关文章所有 3 个版本

[PDF] acm.org

Characterizing manipulation from AI systems

M Carroll, A Chan, H Ashton, D Krueger - … of the 3rd ACM Conference on …, 2023 - dl.acm.org

Manipulation is a concern in many domains, such as social media, advertising, and
chatbots. As AI systems mediate more of our digital interactions, it is important to understand …

被引用次数：45 相关文章所有 3 个版本

[PDF] arxiv.org

Mirages: On anthropomorphism in dialogue systems

G Abercrombie, AC Curry, T Dinkar, V Rieser… - arXiv preprint arXiv …, 2023 - arxiv.org

Automated dialogue or conversational systems are anthropomorphised by developers and
personified by users. While a degree of anthropomorphism may be inevitable due to the …

被引用次数：46 相关文章所有 6 个版本

高级搜索

QQ 群

Combating misinformation in the age of llms: Opportunities and challenges

Ai alignment: A comprehensive survey

Gpt-4 technical report

Open problems and fundamental limitations of reinforcement learning from human feedback

Managing AI risks in an era of rapid progress

Managing extreme AI risks amid rapid progress

Black-box access is insufficient for rigorous ai audits

Foundational challenges in assuring alignment and safety of large language models

Characterizing manipulation from AI systems

Mirages: On anthropomorphism in dialogue systems

引用