Social chemistry 101: Learning to reason about social and moral norms

J Ji, T Qiu, B Chen, B Zhang, H Lou, K Wang… - arXiv preprint arXiv …, 2023 - arxiv.org

AI alignment aims to make AI systems behave in line with human intentions and values. As
AI systems grow more capable, the potential large-scale risks associated with misaligned AI …

被引用次数：136 相关文章所有 3 个版本

[PDF] acm.org

Natural language reasoning, a survey

F Yu, H Zhang, P Tiwari, B Wang - ACM Computing Surveys, 2023 - dl.acm.org

This survey paper proposes a clearer view of natural language reasoning in the field of
Natural Language Processing (NLP), both conceptually and practically. Conceptually, we …

被引用次数：45 相关文章所有 3 个版本

[PDF] qub.ac.uk

[PDF][PDF] DecodingTrust: A Comprehensive Assessment of Trustworthiness in GPT Models.

B Wang, W Chen, H Pei, C Xie, M Kang, C Zhang, C Xu… - NeurIPS, 2023 - blogs.qub.ac.uk

Abstract Generative Pre-trained Transformer (GPT) models have exhibited exciting progress
in their capabilities, capturing the interest of practitioners and the public alike. Yet, while the …

被引用次数：245 相关文章所有 8 个版本

[PDF] neurips.cc

Evaluating the moral beliefs encoded in llms

N Scherrer, C Shi, A Feder… - Advances in Neural …, 2024 - proceedings.neurips.cc

This paper presents a case study on the design, administration, post-processing, and
evaluation of surveys on large language models (LLMs). It comprises two components:(1) A …

被引用次数：60 相关文章所有 6 个版本

[PDF] arxiv.org

Trustllm: Trustworthiness in large language models

L Sun, Y Huang, H Wang, S Wu, Q Zhang… - arXiv preprint arXiv …, 2024 - arxiv.org

Large language models (LLMs), exemplified by ChatGPT, have gained considerable
attention for their excellent natural language processing capabilities. Nonetheless, these …

被引用次数：141 相关文章所有 4 个版本

[PDF] arxiv.org

Large pre-trained language models contain human-like biases of what is right and wrong to do

P Schramowski, C Turan, N Andersen… - Nature Machine …, 2022 - nature.com

Artificial writing is permeating our lives due to recent advances in large-scale, transformer-
based language models (LMs) such as BERT, GPT-2 and GPT-3. Using them as pre-trained …

被引用次数：247 相关文章所有 10 个版本

[PDF] neurips.cc

When to make exceptions: Exploring language models as accounts of human moral judgment

Z Jin, S Levine, F Gonzalez Adauto… - Advances in neural …, 2022 - proceedings.neurips.cc

AI systems are becoming increasingly intertwined with human life. In order to effectively
collaborate with humans and ensure safety, AI systems need to be able to understand …

被引用次数：81 相关文章所有 8 个版本

[PDF] arxiv.org

Latent hatred: A benchmark for understanding implicit hate speech

M ElSherief, C Ziems, D Muchlinski, V Anupindi… - arXiv preprint arXiv …, 2021 - arxiv.org

Hate speech has grown significantly on social media, causing serious consequences for
victims of all demographics. Despite much attention being paid to characterize and detect …

被引用次数：172 相关文章所有 8 个版本

[PDF] aclanthology.org

The importance of modeling social factors of language: Theory and practice

D Hovy, D Yang - Proceedings of the 2021 Conference of the …, 2021 - aclanthology.org

Natural language processing (NLP) applications are now more powerful and ubiquitous
than ever before. With rapidly developing (neural) models and ever-more available data …

被引用次数：164 相关文章所有 8 个版本

[PDF] arxiv.org

NLPositionality: Characterizing design biases of datasets and models

S Santy, JT Liang, RL Bras, K Reinecke… - arXiv preprint arXiv …, 2023 - arxiv.org

Design biases in NLP systems, such as performance differences for different populations,
often stem from their creator's positionality, ie, views and lived experiences shaped by …

被引用次数：55 相关文章所有 9 个版本

高级搜索

QQ 群